MAMBA Reworked Updated:
A Fantasy end goal with this metric would be for it to be able to be pitched to teams and be seen on the same level as EPM or LEBRON. However, there was an issue with the original version, while the idea was to use Time Decayed RAPM rather than regular RAPM to create less bias and put less emphasis on the box score I think is still very interesting, it may reduce some practicality of the metric early or mid season as the box score component as it is may not be as powerful as Box LEBRON or EPM’s box score component.
Therefore, without losing sight of the philosophy behind creating the metric, I worked on creating a Box Prior that could stand on its own, so this metric could work midseason with a higher decay rate and still provide a good snapshot of the current season (Simply starting the decay rate out really high and lowering it as the season goes on). Perhaps if the Prior is good enough, I could provide a supplementary single year version as well.
Fundamentally, first I would need to have it be able to work with a relatively high decay rate, so Increased the decay rate heavily:
In the original metric, the decay rate was set in a way so that:
The last game of the previous season would be at 68%
The First game of the previous season would be at 40%
The last game of two seasons prior would be at 28%
The First game of two seasons prior would be at 17%
The Current Decay Rate is set up in this way:
The last game of the previous season would be at 59%
The First game of the previous season would be at 28%
The last game of two seasons prior would be at 17%
The First game of two seasons prior would be at 8%
This creates less bias coming from the previous year, As I do want this to be a single year metric with previous years to help stabilize it.
Heavily Reworked Prior
Originally, I had a belief that the box score prior wouldn’t be as important with the time decay RAPM as a factor. While this is still true, I also did not want to create a situation where I wasn’t creating a strong box score prior for the sake of relying on Time Decay RAPM. Furthermore, upon recreating the metric, this premise was not entirely accurate, as while the test results were still generally better than EPM and LEBRON, the results themselves would vary heavily depending on the Priors, often with results that simply did not pass the sniff test. I still want this to be a bit more impact driven, but upon thinking of the practicality of this metric in-season, I did want to create a powerful box score prior still, so the data is regressed a bit more heavily to the Priors, although I’d assume not by as much as the other All in ones in the sphere still.
Offense:
Took out POE. This was my original innovation at the time for the Prior, but I found that while it lead to an overall accuracy.
Added Transition POE, as players like Giannis and Lebron were underrated in the Prior
Some very limited interaction effects (As they can cause some very weird individual results, I was very conservative with this and set a limit for how much they could alter the original data), and some things like Transition POE were slightly shifted depending on a players Overall POE efficiency very conservatively.
Defense:
Charges Drawn was heavily inflating some bigs who drew many charges but weren’t great rim protectors, but it was a very powerful predictor. While there are likely more sophisticated ways to do this, simply setting arbitrary caps on charges drawn based on analysis of the dataset by position, ended up being a pretty solid way to do things. Bigs and bigger players were emphasized by other components anyways, so this helped balance it out to an extent.
Added Field Goals Missed Against, and added a small effect where (+0.25*blocks) were added to it. Note: I don’t actually believe this improved testing results at all, but the results generally made more sense + I did want to emphasize bigs in the box score prior.
Some fo those changes on defense may not have increased overall accuracy, but I did want to emphasize rim protectors more for basketball reasons, and within the framework of this metric I did believe it did help present players in a more “in a vacuum” type way, from the prior to do so, while also having things like Charges Drawn balance the overall picture out. I made other changes and testing as well, but this was was a brief summary of the big changes
Testing
Here, I will post the Correlations to Offense, Defense and Overall, for LebronBox, and MambaBox, and below that, Mamba and EPM. EPM Priors are not available, but LEBRON priors are, and I want to test the Prior specifically as well.
The testing was slightly different for the sake of time, and since I’m just comparing MAMBABOX to LEBRONBOX and MAMBA to EPM. Since the only thing I really cared about was comparative accuracy between metrics, Rookies were given a value of 0, players who played under 250 minutes in the previous season replacement value. When trying to actually predict with the metrics in the best way possible rookies should be given replacement level values, but with diminishing returns on accuracy as you get higher up this may demonstrate the differences a tad better.
So overall The process is the same as before but more simplified: Get current minutes, if they are under 250 give them replacement values, sum them up, and get R^2 vs wins. I did the same for relative Offensive, Defensive net ratings, and overall net ratings too. This will lead to generally lower R^2 all around than in my original test, but its just done because I’m not trying to get the highest prediction but see how MAMBA and MAMBABOX stacks up versus other metrics
Box Score Prior Evaluation:
Compared to BoxLebron, the Prior here shines more offensively. Defensively it’s about a tossup. 2022 is a glaring miss for BoxMAMBA, although given that this is shared by EPM’s overall numbers, maybe it's more a statement on tracking data that year. Outside of that, the overall average accuracy is slightly higher but nothing remotely meaningful, while LebronDefense wins out in 5/9 years.
Now, I should note: When Creating the Box Prior for Defense, I made it in a way to emphasize bigs more since their impact is a bit more stable in different situations, and just the basketball reasons of 2 elite rim protectors is fundamentally different than 2 elite perimeter defenders for example. This is similar to How LEBRON does it, but I will say that general accuracy improves if I don’t do that, on the defensive end and overall, but I do think in a vacuum it might be more accurate to do it this way in terms of ranking players as long as you’re incorporating stuff to balance it out for truly elite impact perimeter players so you don’t just have a list full of Bigs in the final metric, + I think this type of approach makes more sense in conjunction with this kind of approach to the impact side of things
As I know the person behind BBI I don’t really want to display the LEBRON results in this testing + it would be a tad unfair because they do a lot of cool stuff with padding low possession players which would not be represented, but overall gap in defensive prediction between the two metrics with this methodology of testing was pretty large.
Its overall performance, off a glance, seems similar or perhaps better relatively to BoxLazyLebron which I mentioned was an unreleased Prior for an unreleased metric called LazyLebron, which was discarded because of some spurious individual results at the top. (The final results themselves weren’t released, but it was noted Steven Adams, Caruso, Delon Wright and Clint Capela were all in the top 10 for 2022-23 as an example of an issue. This is in the final metric, not the box score prior).
I will show the MAMBA results in the same format as I did before, but generally that wasn’t much of an issue for my metric, I felt the current results “passed the sniff test” much better than the previous ones I published for example, although still there were a few caveats to that and exceptions.
Last note: I would likely approach the Defensive Prior a bit differently to be more precise if/when I do a single year version of this, I wasn’t necessarily trying to get the highest prediction accuracy as I could, as a few iteration and variables I excluded led to higher prediction accuracy in testing without changing results too drastically, I just felt with the TDRAPM still being a part of it being conservative with this made sense.
EPM BOX is not available, I would likely say that EPM BOX is probably better than this, as that also incorporates tracking data and EPM Defense always tested very well, outdoing the previous iteration of MAMBA’s defense by a decent margin. I would likely try a more precise approach with tracking data if/when I create a single year version, which I am more interested in now after seeing the performance of the Prior
Overall METRIC Testing
Now, because Rookies weren’t given values at all (Thus, would be 0), the actual accuracy results from here are going to be less accurate than had I given them replacement values, but to an extent I believe this might be better for demonstrating the predictive accuracy between metrics.
In general, I would say defensively it is about a tossup, but offensively and overall accuracy wise MAMBA seems to have an edge. The gap is larger that it was before, likely from slightly different methodology and more glaringly because rookies were given values of 0 instead of replacement values. The actual gap is likely smaller that it appears on here, but MAMBA still performs a good deal better regardless.
Actual Results Breakdown
In the original writeup, I went over some results I thought were weird, and got the corresponding results here. Here is what the new numbers say about these players.
2015: MAMBA: Lebron at 6, George hill at 7. (EPM Lebron at 5, George Hill at 7, ) (LEBRON: Lebron at 4, George hill at 15)
MambaNew: Lebron at 3, George Hill at 5 - George Hill jumping up a few spots is a bit odd
2016: MAMBA: Lebron at 4 (EPM 4) (LEBRON 2)
MambaNew: Lebron at 2
2017: MAMBA: Durant 8, Lebron 3 (EPM 14, Lebron 7 ) (LEBRON 9, Lebron 3)
MambaNew: Durant 9, Lebron 1
2018: MAMBA: AD 15, KD 16 (EPM: AD 3, KD 15,) (LEBRON AD 8, KD 12).
MambaNew: AD 9, KD 16
2019: MAMBA: Kawhi 15, AD 10 (EPM Kawhi 15, AD 5) (LEBRON Kawhi 12, AD 2), Player of the year of course, its just low on him because toronto did well without him playing sometimes and its an impact thing
MambaNew: Kawhi 17, AD 5,
2020: MAMBA: AD 10, (EPM AD 8) (LEBRON AD 5)
MambaNew: AD 6,
2022: MAMBA: Luka 18, (EPM Luka 17), (LEBRON Luka 8),
MambaNew: Luka 22,
2023: MAMBA: Luka 11, Giannis 12 AD 21, (LEBRON Luka 7, Giannis 2, AD 5), (EPM Luka 7, Giannis 9, AD 10)
MambaNew: Luka 12, Giannis 8, AD 17
2024: MAMBA: Giannis 7, EPM Giannis 4 ,AD 18, LEBRON Giannis 2: AD 6
MambaNew: AD 14, Giannis 6 (Note, Originally outside of the MVP candidates it was PG and Mitchell above him, now its only Bron and the MVP candidates, which seems a bit more reasonable).
Overall, as you would expect, some of the eye popping results that were pretty general across all in ones remained here. Outside of Luka, the differences are more towards the direction of what you would expect, and generally around the EPM range. Giannis jumped up a bit in 2023 and 2024, and instead of being behind PG and Donovan Mitchell outside of the MVP candidates hes behind Bron with some seperation versus everyone else.
Outside of that: It has Jokic as #1 Every year from 2022 to 2024, similar to LEBRON and not similar to EPM, but is lower on him in 2021; Its generally higher on Curry and Lebron, but the pretty obvious fix was AD is no longer severely underrated. While it isn’t necessarily high on him still, and I do think LEBRON is more accurate in this case, its more in line with EPM most years.
To Wrap up, the goal was to
Have results pass the sniff test a little more whlie maintaining predictive accuracy
Create a genuinely good Prior that can stand on its own, so this metric can be used in season rather than just at the end of seasons (Showcased through maintaining accuracy while increasing the Decay Rate)
Overall, I think I achieved that. The results maintain despite the decay rate being increased substantially, the Prior is tested now and does pretty well, and while the results aren’t incredibly different, for the most part the differences make more sense or are in line with some odd results from other metrics rather than being alone in that regard, although its REALLY high on kemba now. I think there is potential for this to be a single year metric too, but for now I think it is at a point where it can still do what its meant to do (reduce bias with less box score weight) but still usable in-season with a higher decay rate to without being too biased towards prior seasons to get a good image of the current season.Here are the results. They aren’t sorted by default although it might seem that way, so click Mamba/the Overall to sort it https://timotaij.github.io/LepookTable/
My All in One Metric, MAMBA?
It all begins with an idea.
Originally, I had a very long section on the background of All in Ones and my opinions on them and some personal caveats I had with them that flowed into the justification for why I built this metric this way. I originally thought this overall blog post was 7 pages of text, to my horror I learned it was 30 pages and CTRL+A lied to me. I also learned the 2024 dataframe did not include rookies. Therefore, I will be keeping this much more brief, but the much more in depth sections explaining All in Ones, delving more into some reservations I have with them, Justifications for those reservations, and how that leads to the framework I gave here, are in that really lengthy version of the blogpost, which can be found at the bottom of this post.
Here is a very quick summary of that section, as it is important to understand what went behind the thought process of creating this. To pin this post, this will always be said it was published after any post I make, but this metric was made at the start of September
APM is the base form of RAPM (R stands for regularization, so this is without that), and tries to see how impactful Player X is, by seeing his impact on scoring margin, controlling for the 9 other players on the court. However, it struggles with multicollinearity, assigning the right credit to teammates who share the floor alot. To understand RAPM and Bayesian All in ones, think of it as a metaphor, of your really insightful friend watching basketball for the first time, and he’s just screaming out how good he thinks players are (Impact =/= goodness, but lets ignore that for now) by saying a number that he thinks represents that. but he just thinks everyone on the team is so great because they keep winning by 100.
Ridge Regression is typically what the “R” in RAPM is, and it involves shrinking those predictions to 0. So you can think about it, like as your friend is screaming out those numbers, you are here telling him every single player is actually a 0. That sounds a bit odd, but what ends up happening is as you keep doing this, he starts realizing who the standouts truly are; you “trolling” is going to potentially affect his opinion on many players, but while he might look at a random player and say “hmm my friend says he’s a 0 maybe i’m overating him” he’ll 2018 Lebron drop a million points and think “my friend is tripping” While this helps out a ton in practice, even though your friend is super smart, you only have a limited amount of games to show him so he might just not have enough film to truly parse out who’s good and bad, accurately at least. The other issue with RAPM can be sample size, one year RAPM is very noisy (as impact data is in that scale in general at times)
All in Ones, instead of saying everyone is a 0, you try to help him out by saying a number for each player that represents how good you think they are, this is the Box score component, SPM, or simply the prior. This has 2 effects, first, its likely intuitive how much better of a baseline it is to have some sort of measurement of how good each player is separately (So Jokic and Tristan Thompson have separate ratings you tell your friend, instead of saying they are all 0s and equal), and it ends up just being a nudge so your friend can get to the more accurate answers a bit faster
This makes All in Ones far superior to RAPM or APM, however, in my opinion while this helps alleviate issues with noise, it in turn creates bias. You are applying a linear model (more on why other types of priors don’t work well in the doc, just assume its linear and ignore XGboost and that stuff if you are aware of that stuff), to capture trends on every player. It naturally will under and overshoot on certain players because box scores dont tell you the whole story. This isn’t an issue in comparison to RAPM and APM overall as it does just end up being far, far more accurate, but at the same time, with some players (Lebron on defense post miami, likely KG), you can see that they are clearly hurt by all in ones (or at least relative to other superstars) much more because of this bias consistently when cross referencing with RAPM for players with “similar stature.” You create bias, in place of noise, massive improvement overall of course, but can cause some issues for some individuals consistently
METRIC BREAKDOWN
So essentially, for now, my metric has 2 main innovations and a few other just small tweaks that I believe boost improvement, which I will split into the Box Score Component and the Impact Component
BOX SCORE COMPONENT
The Box Score Component uses typical of per 75 poss box score data (per 100 poss adjusted, per 36 minutes), and blends some very conservative use of Synergy data, and Tracking data.
Assists were replaced with Assist points created, Blocks with Rim Points Saved (DFGA * DFPerc Diff * 2), and Unassisted FGM was used, Charges Drawn was used
Created a metric called SynergyPlayTypePOE, points above expectation based on a player’s play type distribution frequency and efficiency, in a way to account for shot quality or if a player was incredible at doing hard things. This was by far the biggest boost in the context of the box score priors.
General things, like % of games started, and team Off/Def RTG * % of team minutes a player played, which are implemented in other Priors like PIPM, were also used.
Offense was far better than defense, but the box score prior is certainly in its alpha stage, especially on defense.
IMPACT COMPONENT
Big change was here, I used an Adjusted Time Decayed RAPM here where the decay rate started before the start of the current season, and would not go beyond 2 seasons prior. (So current season, lets say 2024 would be weighed fully, and then the model would not even look at 2021 data). Time decayed just means you weight games less by how far into the past they are, this is done to take into account beyond just recent things like offseason work and improvement (or decline!).
Why do it this way, isn’t it better to just look at the current year for a current year metric? In practice, Time Decayed and multi-year RAPM with less weight on previous years, is similar to PIRAPM. PIRAPM is RAPM with previous years RAPM as priors (You yelling at your friend) instead of being 0 . These results generally look much better than Raw single year RAPM, especially in the noise category.
An example here is important. Here is a end of RS RAPM I found pastebin post of 2014 raw RAPM around the end of the regular season https://pastebin.com/gT2aN0P5 - Yes, that’s Miami Lebron at 36th. This was posted by J.E somewhere, who’s like RAPM god, so it was done right. PI RAPM uses the playoffs too, but Miami Lebron is now first. In general, Time Decayed RAPM is also far more predictive than single year RAPM, regardless of if you run it raw or do luck adjustments like BBI likes to do. It’s very comparable to All in Ones actually, maybe even favorably so even compared to the best ones.
Did luck adjustments on Free Throws (I consider FT OREBs a continuation of a possession if players on the court do not change, so it did not hurt these players), and an VERY minor one on 3 pointers with less of one on offense (Controversial of course, doing it without it would yield the same results as I did it with such low magnitude, I think Ryan Davis’s set was 50%, BBI is a bit higher i think, mine was like, 25% overall in hindsight likely didn’t change anything)
Fundamentally, though, there are some concerns here with Time Decayed RAPM and how this can create bias which I was worried about before as well, which are very valid. I won’t sit here and explain how these concerns are completely invalid, but I will try to justify that these aren’t as worrisome as one might think with evidence below. For reference, Shai is 2nd in 2024 “MAMBA”
How do these things fit together
The Box Score Component reduces that past year Bias because it only takes stats for the current year. It’s like a left hook as you’re falling from a right hook. To go into the benefits overall and alleviate obvious potential concerns. You can kind of set how much the model regresses to priors (How much your friend is listening to your numbers). TDRAPM creates a larger, more stable sample, meaning you can set this number to be less strict, and also create SPMs that can potentially capture less connections, whereas normally you have to absolutely NAIL the top guys. being super high so it’s passes the sniff test + for messaging. Really, you still have to do that to an extent, but there’s just less reliance there, if that makes sense. Essentially, it’s a mutual enhancement between the larger sample and the SPM helping each other out.
I do think that there are individual cases where the fact that the prior year is a factor may hurt, a huge concern would be if it can capture players who made big jumps. Therefore, I got every MIP and their ranking in MAMBA (that’s closer here) EPM and LEBRON.
To be clear, this isn’t to say green = good in the sense that its more accurate, but it is more to show that “MAMBA” does not have issues with players who have big jumps or changes in performance. (Perhaps it would be better to look at players who had massive RAPM changes, but that seemed a bit less practical and a less strong/clear message) In theory, MIP = biggest jump, of course in practice that may. not be the case.
Now, I won’t handwave this concern away, it absolutely creates a bias. The way I would say its a value-add, is that you alleviate the bias that the box score or SPM can create but create some bias with this previous year being weighted.
Last thing before I show the metric’s performance and accuracy in testing: Some will tout their all in one as the literal impact a player had, many of these people are far smarter and more qualified than me. even though I do think even in this proof of concept form this is generally at least somewhat competitive with the stronger All in Ones in the public sphere, here is my stance: Just like how we use (practically) Raw Impact as a player evaluation tool to estimate True impact, All in ones are all ESTIMATIONS of True Impact. I’m never going to say “Player X was a 7th impact player” because he was 7th in MAMBA, or any other all in one, because they are all estimations. EPM and LEBRON do the best in testing, (not including MAMBA :P), but they can vary by 100 spots or more for some players. And its not even necessary that any of them are right on an individual player, maybe they all miss the plot! That being said, if they all paint a clear picture on a player’s impact not matching your preconceived notion of them, that may be a sign that its something to look into, but it is NOT irrefutable proof that you are wrong.
METRIC TESTING
To test this, I incorporated something called Retrodiction testing, using methodology similar to how EPM and LEBRON creators tested these metrics. I did mine most similar to Krishna Narsu’s (LEBRON creator, although he used projected minutes and I used actual minutes, he did do actual minutes separately). Essentially, I got players “All in One Score”, Multiplied that by minutes next year, and then grouped players by team. Getting the overall sum gives you a “Team Score”, and then got the R^2 (Correlation Squared) to team wins the next year. R^2 = how well it explains variance, but just think about it like how well it can tell which teams are better than which for now (Don’t destroy me for explaining it like that stats people its just for simplicity :D). low minute players were given replacement player values , rookies were given a tad better than that but still strongly negative.
How did it perform? Keep in mind, this is a stage 1, proof of concept version. Most of these metrics have been tested and gone through lengthy development and testing before they were deployed, and subsequently editted and further improvede upon over the course of real world testing and among years, This was quite literally was from the first batch of results (I made 4 batches with the difference being how important I told the model to weigh the box scores) I made. In this overall process, if you take out the time I spent collecting data, running the model, testing the metrics among each other (as I did not test more than once), and the longest part of all, writing this, between the week I’ve spent doing this, genuinely only about two days was spent on the entire process of building the model itself, with very little fine tuning having been done.
With those caveats (excuses) out of the way, it performed the best by a very decent margin. It was the best in 5/9 years, including 2/3 years out of sample. The out of sample years especially it performed better, LEBRON had an average R^2 of 59.2, EPM one of 58.73, Mine had one of 64.8. The results I had overall were similar to the ones that were shown on Krishna Narsu’s twitter (For LEBRON and EPM), so I doubt any deviation in testing or large error occurred. (“it would be the, Compared to what Steve suggested” post).
It likely performed better than the unreleased metric, LAZYLEBRON as well based on the relative results, which was a BBI metric that was incredibly predictive but produced strange individual results (Like Steven Adams, Caruso, Delon Wright and Capela all top 10 in 2022) and therefore was unreleased. (all of that info is on his Twitter, that is where I got it from). A multi year version of this metric (Multi year in this context = getting MAMBA from 2020, 2021, and 2022, and using that to test 2023, and having that be the model) would be interesting, as that was the multi year predictive thing tested in the below twitter thread (It wasnt just LEBRON or EPM using multi year data, it was using different years of LEBRON or EPM values). The thread below has Krishna Narsu’s results testing the metrics. assuming the gaps are around the same, a 0.03 R^2 gap is at least seemingly notable.
https://x.com/knarsu3/status/1763321501766627328
Now As Cool as it would be to be able to replicate this:
I will make this very clear. This is a first Draft of a metric. More than that, I do have much more appreciation towards what goes into making All in Ones and that there is a balance between “Predictive accuracy” and “Players have to make sense.” To be fully transparent, here are the general things I think need to be improved on:
I want to have more emphasis on Bigs, Perhaps separate models? I think doing a KNN to get different groups of defenders sounds like something that sounds nice but won’t actually work practically with this kind of model
It likely undershoots AD and Giannis defensively, I want to incorporate Blocks into the Rim Saved metric somehow, as when having them as separate predictors multicollinearity shenanigans occur, and interaction effects create no-nos, but I can think of a few ways to incorporate it. (So blocks aren’t currently part of it, just rim points saved)
Synergy POE was a super good thing to add, it might undersell guys who are efficient because they create crazy good opportunities with their movement vs their team doing it, AD and Wemby provide unique value there. If I could separate Rolls and Pops that would be nice but I don’t have the synergy API.
It will generally struggle to Identify really good players on teams that can be elite in the regular season without them. KD on the Warriors and Kawhi on the Raptors, because it is less reliant on box scores than some other all in ones (I guess thats somewhat of a niche it fills lol), certain players like that might be undervalues. this kinda is true for all metrics though, but still
This is the first run, playing with different box score weighings and decay values is on the agenda, but I want to focus on the SPM portion of the metric, particular the defensive part, because that is certainly still in stage 1. Perhaps different box score weighings for defense and offense too?
Strange Individual Results
All in ones always have some weird results at an individual level, so before this into the trash because player X ranked a bit weirdly, I briefly went through some weird ones off a very cursory look and compared it to other metrics. If they all have a similar weird result, that isn’t to say that they are all right or anything, its to say that its just an All in One thing versus a flaw with MAMBA specifically, but certainly some players I think my metric will “get wrong” more than other ones, which applies to all of them of course. Here is a breakdown of some odd results and some general commentary.
2015: MAMBA: Lebron at 6, George hill at 7. (EPM Lebron at 5, George Hill at 7, ) (LEBRON: Lebron at 4, George hill at 15. )
Note(2015 to 2017 was interesting because Lebron shot up the less I weighed the box score, so kind of the “This type of stuff undersells him” vibe, he was overall #1 taking the 3 years together in the impact part of it by alot (Curry 15-17 was the #2 stretch from 2014 to 2024 I think too, or something like that. Also it gets AD very wrong, its very low on AD for some reason which I disagree with).
2016: MAMBA: Lebron at 4 (EPM 4) (LEBRON 2)
2017: MAMBA: Durant 8 (EPM 14, ) (LEBRON 9)
Obviously should be much higher, but mostly from the warriors doing well when he was off the court
2018: MAMBA: AD 15, KD 16 (EPM: AD 3, KD 15,) (LEBRON AD 8, KD 12).
I think AD should be top 5 with the RS he had, but yeah mine is insanely low on him for some reason, its a flaw I think
2019: MAMBA: Kawhi 15, (EPM Kawhi 15,) (LEBRON Kawhi 12), Player of the year of course, its just low on him because toronto did well without him playing sometimes and its an impact thing
I think AD should be top 5 with the RS he had, but yeah mine is insanely low on him for some reason throughout
2020: MAMBA: Kemba 8, (EPM Kemba 43,) (LEBRON Kemba 23)
This is a “what the hell result”, like EPM having Nurk or Zu top 10 in some years, LEBRON does a good job at not having really odd players top 10 consistently actually
2022: MAMBA: Luka 18, (EPM Luka 17), (LEBRON Luka 8),
All of these undershoot him but EPM and MAMBA absurdly so, LEBRON does the best here but obviously Luka is Luka
2023: MAMBA: Luka 11, (LEBRON Luka 7), (EPM Luka 7)
Same as Prior except mine stands out at undershooting him
2024: MAMBA: Giannis 7, EPM Giannis 4, LEBRON Giannis 2:
Relevant: MAMBA Defense: Giannis 1.2 (87th) EPM Defense: Giannis 1.8 (72nd) LEBRON Defense: Giannis 0.9 (64th)
EPM and LEBRON are both reasonable rankings, mine is not. Really obvious glaring issue with Mitchell above him on defense on mine, on EPM they are actually fairly close (Mitchell at +1.5, Giannis at +1.8) , in LEBRON they are further apart with Mitchell at 0.2 and Giannis at 0.9, but all of them are a bit lower on Giannis defensively this past year (which I don’t agree with, to be clear). That being said, I would say mine did a good job ranking his defense during his DPOY seasons and 2021. On EPM: 2019 24th, 2020 9th, 2021 64th. on LEBRON, 2019: 3rd, 2020: 7th, 2021: 5th. on MAMBA: 2019 8th, 2020: 2nd, 2021: 5th. EPM stands out as being a bit odd there, LEBRON and MAMBA both do similar jobs, with MAMBA probably undershooting his 2019 DPOY campaign and LEBRON undershooting his 2020 DPOY campaign. Knowing how his raw RAPM and impact data looked like during this stretch, I would argue the closer to 1 or 2 the better personally. Not sure what to make of this.
AD being off in a lot of these, It has Jokic way too low on 2021 at like 7, but has him #1 every year since by much more than the other metrics.
This might look bad, but I’m literally looking through to see things I find stupid about mine, you could likely do that for everything (I think 2024 Curry is like 25th in LEBRON? 12th on mine and on EPM, but LEBRON generally I think looks really solid at the top for sure) The point isn’t to disparage anyone, or any number, but the point is to say all of these metrics have some weird individual results, mine included. I absolutely think on many of these mine is just flat out completely missing, hopefully once I edit the box scores it will mitigate that issue. Pretty much all of these players (Aside from Kemba and Hill) I consider top 1,2,3 or 4 in those years.
I think All in ones are fantastic tools but they aren’t like a “how good is this guy” metric, a guy being ranked way lower than expected on a team that functions well without him isn’t necessarily a bad sign on that player, because impact comes just as much from “They get way better when you’re there” To “They Suck when you sit”
There is seemingly a tradeoff between how accurate/predictive a metric is and some really odd individual results. Based on the accuracy tests at face value, I think the tradeoff is worth it at least considering what seemingly happened with LAZYLEBRON’s results
WNBA version for this exists and it is not public because I am in an internship (and hopefully will become full time!). It tests better than anything available but for the WNBA specifically I much prefer LEBRON right now for the low sample padding.
In terms of not having glaring players at the top or having a top tier guy way too low, I think LEBRON does the best job there, off a quick glance
Im pretty fine saying the “misses” here are simply misses caused by bias or just noise clouding reality, many people within the analytics space might be against me having that opinion if all the all in ones agree on something, but that’s just my personal take in some situations where it just doesn’t pass the sniff test. I do think when those discrepancies exist, its worth looking into though, just that all in one results that really deviate from general opinion is more a potential signal than some sort of proven, irrefutable answer, if that makes sense.
FINALLY THE IMPACT METRIC
WEBSITE FOR INTERACTIVE TABLE (Preview Below) https://timotaij.github.io/LepookTable/
NOTE: Players who played under 200 minutes in a season may not be shown correctly, but that was not a problem for the metric testing
So what does this mean? Did I create some new super metric or whatever that towers over the competition?
NO
Testing and Out of sample testing is cool and all, but at the end of the day it isn’t the same as legitimate real world results after it was made. Now to be clear, this isn’t a case where I kept building the model running it over and over again until I got good correlations, this was all within the first batch of results I got.
More than that, this is the “Proof of Concept” phase. My hope is some of the odd individual results will be fixed once this metric gets perfected more so. I have things I blatantly need to work on, the Offensive results overall were great in testing but the Defensive results weren’t quite as stellar (Did retrodiction testing with offense and defense). They were still good, but were a good deal worse than EPM, and defense was where I thought this would shine. This is likely because of issues with my SPM, this methodology means it can handle a worser SPM in general but that is not an excuse to have one that is not particularly good.
For me, this serves best as a “Proof of Concept”of this type of framework. In its current stage, with its testing I would tentatively say it likely is comparable to things like LEBRON or EPM, even though it may perhaps have some odder results at the top end from time to time. Maybe it does better at predicting people outside of the top 10? But EPM and LEBRON are essentially perfected versions of what they are within their framework, while this is very much scratching the surface with its framework. Each one will have certain flaws and misses and biases, but beyond some exciting test results, I think its valuable to be able to know more clearly where those biases come from even if it has less overall “incorrect” bias in that regard (with this being the last year stuff).
Obviously this wasn’t the most formal post, but yeah any questions comments concerns or if you just wanna reach out. timothycwijaya@gmail.com, Timotaij on instagram, Teemohoops on twitter, and my linkedin of Timothy Wijaya are probably the best places to reach me.
Note: Caveats over All in Ones on a more philosophical standpoint are beyond the scope of this post, but thats a very interesting discussion
Note: This list does not represent how I would rank players, AT ALL.
Note: As I said, this is a first draft of a metric.
Note: Public sphere is important, im sure teams have better versions of these in house
The WNBA version was without synergy stuff which I will have to add downloading CSV files manually, and of course no tracking data, it performed better than any All in Ones in the WNBA scene by a large margin, although I prefer LEBRON for the WNBA (LEBRON is not readily available for the WNBA, its similar in its predictive accuracy there).
Long version of this (It is unedited, has many grammar errors and generally much more rambling and isn’t super professional). Perhaps just the first part with the explanation and breakdown of All in Ones and justifications of the caveats I have with them and the issues I had with biases I mentioned is useful though. https://www.teemohoop.com/mamba-or-lepookie
Huge thanks to Seth Partnow and Ben Alamar for giving insights during the Las Vegas SBC program (We did not talk about this or anything, but it they helped me a ton with how to approach my internship), to Eli Horowitz for giving me a chance with giving me the internship opportunity as I likely wouldn’t be able to do anything in basketball if it was not for that, and to Nathan Hollenberg for helping me out with some questions I had on RAPM samples and for all the wonderful advice he gave me during our coffee chat!
Four Factor RAPM
Something that I thought would be fun to do was Four Factor RAPM, it ended up being pretty straightfoward so I did it last night after dinner and I ran it overnight. RAPM can be seen as a fancy way to parse a player’s impact from teammates, but some consistent weird results can pose more questions than answers. For the full breakdown of RAPM and some concerns I have with it, you can find that on this the first few sections of this post here. https://www.teemohoop.com/mamba/Blog%20Post%20Title%20One-mm8gk
If you go there or are visiting for the first time, maybe check out the All in One I made, the updated (non crazy long) write up is available at the top of the link I sent, or you can go to it directly here https://www.teemohoop.com/mamba/Blog%20Post%20Title%20One-mm8gk-cy9wh
Most of this is copy and pasted from Linkedin, but a few extra examples at the end:
Why does Jokic’s defensive RAPM always end up being so good?
Why are 2023 and 2024 Embiid and post-GSW Durant's ORAPM not quite as high as the top offensive players in the NBA?
Why does Caruso’s defensive RAPM always look like he's the greatest bald guard defender in NBA history?
A flaw of RAPM, (beyond noise and ignoring context) , is it doesn't explain the “Why.”
So, many treat it as a signal to see if something is worth analyzing or looking into. While I this opinion over this kind of data, I thought making it point out the “why” when its not noise, might be possible, to an extent.
Last night I thought it would be interesting to make “Factor” RAPM, instead of getting impact on point differential, you get impact on TS%, Rebounds, and TOs. (Think the four factors, combining FTR and EFG and FT% into one). You end up:
OREB impact
OTOV impact
OTS impact
DREB impact
DTOV impact
DTS impact
You then scale it in a way so it generally adds up to Offensive and Defensive RAPM. Which I did just running linear regression using those variables on the component, and getting the coefficients and multiplying. I did this in a way so a positive number always was good, and a negative number was bad. (Positive DTOV = force more turnovers).They don't add up perfectly, but generally add up pretty close (0.1 away on average).
Practically, how does this help with questions from earlier?
With Jokic: As much as his hands/positioning/general Defensive IQ is praised, the entirety of his positive defensive impact pretty consistently is from his insane impact on the defensive glass.
With Embiid and KD: I’ve heard arguments its from a lack of playmaking, but there’s isn't much evidence here suggesting this is what’s behind their lower than expected offensive RAPM. Durant’s impact on TS% was better than Jokic from 2021-2023, and Embiid was 3rd his MVP year and 2nd this year, despite his offensive RAPM being outside of the top 10 both years. Its mainly from negative impact on the Offensive Glass, especially for Embiid
With Caruso: He led the league in “Defensive Turnover Impact” for the last 3 years
I caution making any sweeping conclusions off this or raw RAPM in general, especially off two year samples, but I believe there are interesting practical takeaways you can get from this type of analysis, at least when its not just noise. Sometimes you can even see if a player's ranking might be dubious for something out of their control, if a perimeter player who doesn't crash the glass has an exceptionally high Offensive RAPM one year because OREB impact, and its a one year thing that isn't consistent year o year, maybe its noise or luck.
It wouldn’t be difficult to add more years or change the bounds of years depending on if the NBA PBP Possession format from the API is the same
data from 2016 to 2024 (2 year stints): https://timotaij.github.io/FactorRAPMScaled/
Raw version (unscaled to RAPM) from 2016 to 2024 (2 year stints): https://timotaij.github.io/FourFactorRAPMRaw/
Running a WNBA version of this is INCREDIBLY noisy because, well RAPM is incredibly noisy for those samples, but some interesting practical takeaways from that, thats private though lol.
I haven’t done anything on MAMBA since I made that blogpost, but might take out the offensive and defensive team * minutes effect that was originally there.
Long, Unedited Version of All in One Post
It all begins with an idea.
NOTE: If you are reading this for the first time, please go to this link instead:
https://www.teemohoop.com/mamba-or-lepookie/Blog%20Post%20Title%20One-mm8gk-cy9wh
What are All in Ones, What actually Goes Behind them?
So through this week, I made my version of all in one metric I had conceptualized awhile back, at least a first draft of one, and I will get into that below, but I think its important to explain what an All-In-One metric is too. Most explanations online either cut a million corners or it’s an online statiscian saying “It’s just this simple formula :D” and pulling this out,
Since I hope some people Vegas are reading this, as well as maybe anyone else in the basketball scene that stumbled on my Linkedln post, I think that its worthwhile to explain fully in a more simple and easy to understand way what All in One metrics really are so you can make your own opinion on what you think of them. This won’t be the most formal intro to the stat or a metric you’ve ever seen, but I hope it’s interesting. Just skip through if you want to skip to the breakdown of the metric itself, but I would probably say at the parts of what the number is comprised of is kind of important and how it tests, and that section kind of flows nicely from this section.
Before getting into All in One metrics, its important to understand what RAPM, the backbone of most All in one metrics, is. “APM” from RAPM stands for Adjusted Plus Minus. which takes into account 11 things, the player of interest, the 9 other players of the court, and the scoring margin. It tries to see how the “player of interest” effects the scoring margin, while controlling for 9 other teammates as factors. However, the issue with APM is the idea of multicollinearity, which simply means it has a hard time distinguishing between teammates who play in many lineups together. Basically, it struggles to assign the right amount of “Credit” to people who play together alot, it will think because I am on the court with Lebron, it gets “tricked” into thinking that Im really good at basketball and not that Lebron carries me.
This is where the “R” in RAPM comes in, to mitigate this issue and distinguish between teammates who share the floor alot. It can help say Lebron is the one carrying the team and 2018 SNL recruits Childish Gambino, Pete Davidson, and Kenan Thompson, weren’t actually a secret top tier players, they just played with Lebron in many lineups. R stands for Regularized, and in general in means Ridge Regression for RAPM. now that sounds all fancy, and this is usually where someone throws a giant math equation or says “Just know it does this” But its actually pretty important to get this part to really understand in depth how All in Ones work and why they work, so I’ll give a real world example to illustrate this.
Imagine you’re a teacher have 2 troublemaking twins, one named Marco and the other one named Kenji, but you know one of them is “The bad apple” and the other one is just following their lead, Similar to how Lebron is carrying the team to high score margins and Random Player “Justice Young” is just along for the ride by being on the court with him alot. One of them starts shouting and the other one follows, and you want them to stop, and figure out the bad apple. You get all fake dramatic and yell at them to shut up, and you keep doing this over and over again, knowing the not so bad kid would feel bad and chill and the more consistently problem child will continue to troll. Eventually, you keep doing it enough, Kenji starts feeling bad and chilling out since he wasn’t really like that, while Marco just keeps screaming since Marco sucks, you’ve learned Kenji was chill and Marco was the real trouble maker.
Practically, Ridge regression is a similar concept. Instead of kids yelling, its their “scoring margin values” (Or how impactful they are), you’re pushing the values to 0, and instead yelling at them, its high score margins. So think about it this way, You are trying to find out the troublemaker between the Twins Marco and Kenji (finding who is the “driver of the high scoring margins” between Lebron and Justice Young who often play together in lineups), so you start Punishing them and telling them to quiet down (Shrinking Coefficients to 0), and as you keep punishing them, Kenji, who wasnt truly a problem child but just following Marco, begins to behave while Marco continues his mischief and is less effected by the punishments (Player X value starts going down, Lebrons stays high up and is less effected by the punishments as his “effect” is more consistent and is driving it more.).
The main confusing thing there is shrinking coefficients (PlayerValues) to 0, but just view it as “punishing” and you basically get the gist of it. I guess one better way to explain it would be, imagine instead of a advanced model, its a guy screaming out numbers of how good he thinks Lebron and Justice Young are every time he sees them play, and every time he does this you tell him they are both 0/10 players (You a hater), his “screams” are the coefficients (model guesses. While not a perfect representation, you can view it as the model keeps making guesses on how impactful it thinks players are, until it’s happy with its choice or runs out of film to watch) and you telling him everyone is a 0/10 player is “shrinking his ratings towards 0.” Hisranking of role player Justice Young is going to be more affected by you trolling him into saying he sucks, than his ranking of the greatest player to ever pick up a basketball where he can clearly see the greatness.
And thats pretty much RAPM! Why was it important to understand All-In-One data? Well, there are multiple forms of All in Ones, but the one I made, LEBRON, EPM, and ESPN RPM back when it was created (Its creators have since left to NBA teams, I’ve heard that the metric is a bit weird now since they left), are “Bayesian Prior Informed RAPM” That sounds super fancy and I have absolutely no idea why people don’t ever explain it normally, but its actually simple (like genuinely, not in the, “its suuper simple and then throwing an alphabet with an equals sign at you”).
Instead of the punishment being you punish everyone’s values to 0, you “punish” different players numbers specific to how good you think each player is. This is where the Box Scores usually come into play, you just use box scores to create a number for each player that gives a rough estimate of how good that player is. This has a HUGE effect on RAPM.
If that’s confusing, You can think of it this way, Imagine RAPM is your friend who has never watched basketball before, trying to learn about basketball in a limited amount of time (in this case, time would be the possession sample size it has to learn from).
Without Regularization, he just thinks everyone on the 2016 Warriors was a 10/10 because they won by 50.
With Basic Ridge Regression, which is pushing the values to 0, when he said he thought they were all great you kept saying they all actually suck, and he kept watching and said “hmm I guess some of them weren’t as impressive as I thought, but I thought that Curry guy was pretty good though!”
With Bayesian Regression, as he is watching you are giving him you’re complete honest opinion on how good every single player is while he watches, and keep saying your opinions on those players instead of saying they are all 0s
This number you are saying to him, is the “PRIOR”
You see the difference? Keep in mind in this case your friend is a super genius and will pick up on things eventually, but he’s a bit slow and just needs a lot of film, or a bit of a nudge. With a limited amount of time, getting him closer with those good opinions will really speed up the process as often he won’t have enough time to get the answer right.
That in a nutshell, is what much of All in One data is, at least a large proportion of the best ones. They create a number represent how a player is using box score data, and that becomes the prior that you scream at your friend watching the game over and over again.
Caveats to this Approach in All in Ones:
It all sounds really nice, but there are some practical issues, 2 of them I will put down here in my opinion (I’m not gonna go into the caveats with this type of approach for evaluation for now too)
1) that it takes a pretty big sample for your friend to truly get players right
2) the priors themselves (Your opinions your telling your friend watching the game) can skew his opinion in incorrect directions.
My version tries to tackle these in its own way (in the week I made it lol) but here’s kind of in depth an explanation of the issues to demonstrate why I felt they would be interesting to tackle this way. Feel free to skip this if you don’t really care.
(For this explanation, you can think of Noise = things distracting from the true value, imagine you’re trying to listen to lyrics of a song to memorize it but the baby starts screaming at the same volume so now you think the song has some crying in it)
The friend example was good as a visualization and to demonstrate it in a more human kind of way, but i’m throwing it away from here because it kind of takes away from the point of what RAPM in its raw form is and the benefits of it. It’s a impact metric only attempts to parse out the impact player X has on his team’s scoring margin, accounting for the 9 other players on the court. it cares about NOTHING else. Simply put, its unbiased. Sample size is an issue and short term RAPM is noisy, but some people tend to mistake this for “RAPM just doesnt say anything valuable in small samples.” RAPM is a raw impact metric, it. is a measurement of raw impact which in itself is used by people as an estimate of “True Impact”. Whats the difference? Raw impact is simply the points when you go on and off the court adjusting for teammates, True impact is whether or not you are actually the reason or a factor for that score or if its just coincidence you were there when something good happened (You happen to be there when good things happen that you didn’t effect indirectly or directly in any true way at all). A lot of “Raw impact” is simply noise, but it isn’t necessarily always noise, which I think is a key distinction.
With low but reasonable sample RAPM (lets say a season) you do get a ton of wonky results, but much of that “noise” is simply the instability of short term impact data itself, most of the time (key word is most, as in like more than half the time of course) you aren’t going to see wonky results that aren’t apparent in the raw impact data when you look at a player amongst their teammates.
This created an interesting debate in some places I saw back when All in ones first came out, I was like 15 at the time, but from what I remember some people were a bit unhappy and saying all in ones killed the point of this kind of thing. to be clear, I disagree with that take, but I do understand where it’s coming from. With the priors, You end up reducing noise but creating bias, but on the whole this tradeoff is 100% worth it. It’s just an issue on some individual cases at times which I’ll get into more below, but as a whole thats more for when people get to fixated on marginal differences and rankings between players
The second issue is that the Box Score Prior itself isn’t so simple to make. The way it is made is you get stable samples of RAPM, and you train a model that can take inputs (Box score numbers) that can predict a player’s RAPM, and make that the Prior. If you’re part of MSBA reading this and on the more technical data side, you might think “XGBOOST” but no, that doesn’t work because from my understanding that the errors in non-linear models tend to be unacceptable, and in my brief experience running it for this it was awful. Even interaction terms create large unnaceptable errors at an individual level. For a Draft model, sure, Boost dat, I even did one for my internship and my portion was pretty solid (I think I used XGBoost or Lightgbm i dont remember tbh), but not for this kind of thing.
you WANT outliers, at least in terms of really good player X and Y, you want it to “overshoot” on certain players and superstars in some years to stabilize when noise causes some players to be underrated. On a more meta evel, you want some players to be overshot for the sake of a metric looking more respectable, well and for the sake of messaging to be honest. If the ONLY goal was getting a high prediction on RAPM this would be easy, but you kind of have to have some semblance of common sense with your results. That isn’t to say that deviation from general opinion is wrong, having a guy like Caruso in the top 20 or something is completely fine in my opinion when his impact signals are THAT strong (all in one metrics are NOT a ranking of how good players are in a vacuum, to be clear), but if your list has like a bunch of role players in the top 10 and superstars out of the top 50, something is probably wrong. That being said, if certain players are consistently far different from preconcieved notions of where they would rank and 97% of others aren’t really, that’s a valuable data point, but people can often make a bit too strong conclusions from that. A box score prior does help RAPM become far more stable, and also can help create a final metric that isnt completely laughed out of the room. But here’s the thing, its a linear regression, you are applying a generalized pattern to the entire NBA, you ALWAYS are going to overshoot or undershoot on certain players. I have gotten push back on this statement before, but while while it undoubtedly creates better observations overall in a GENERAL level, there are CERTAINLY some players who are overpushed or punished on an INDIVIDUAL level. For me, while I do find all in one data valuable, I don’t view it as a raw measurement of impact like some other people view them. While RAPM has noise, All in Ones have Bias. 99% of the time, that small amount of Bias is worth it and helps a CRAZY amount, but that Bias can also lead to fundamentally incorrect predictions at the individual level where perhaps the Noise wasn’t truly far from reality. To me, both have their place when analyzing a player, All in Ones much more so especially if you only can pick one, but also, watch the game lol
Next two paragraphs are a slight case example with Lebron, you can ignore this if you want to
A case example. Awhile ago, I saw a pretty bad Article on BBall index.com. Now, I do really enjoy the site and like what it stands for, and to be clear, this WAS NOT WRITTEN BY TIM (also known as Cranjis Mcbasketball), Tims a smart guy and he’s pretty chill to talk to so he wouldnt write something like this, but the gist of the article was basically one of the other writers clickbaiting off of the olympics doing a “Lebrons not top 10 and I’ll tell you why with FACTS and STATS” and it just being a guy pulling out the LEBRON metric…
But it actually is relevant to this, because Lebron represents probably the clearest example (That I know of) of a high profile player that represents a bias. While I don’t want to go on a 10 page tangent defending Lebrons honor from Lebron on a spreadsheet in Capslock, what I’ll say is that, especially on the defensive end, for pretty much his entire post Miami career (at the very least),any available “Box Score” component for an all in one of Lebron’s data severely undershoots defensively. The 2 exceptions, 2018 and 2022, are the only years where his actual raw defensive impact data wasn’t good (according to RAPM). This is the case for LEBRON, DPM, and Mine (I’ll release the overall numbers, I can give the priors to anyone who asks but this is a first draft still so need to do some tuning) etc. . On a deeper level though, despite his great box scores, what you end up seeing fairly consistently is the more you weight box scores, the less impressive his All in One data can be. This doesn’t mean “Hey maybe his impact data overrates him” because that’s really not how it works if its this consistent for long periods of time for a high production player, it means Lebron is better than his box score production indicates. To be clear, Lebron’s career age adjusted impact data is by far the greatest in history, and if you only get playoff RAPM (there are caveats to doing it that way beyond the scope of this post), he’s basically a lone dot at the top even without adjusting for age, and thats with him being in LeCoast mode in the Regular Season since 2014. All in one data ironically shrouds the case here, but for his Career Lebron is pretty much the Undisputed king in the realm of impact data (Although obviously now he’s no longer undisputed #1 there). Im sure there are other examples (I feel KG would be another guy?), and sometimes this is by design (LEBRON tends to give extra weight to rim protection from my understanding, which helps it more in terms of predictive value since top tier bigs defenders are better building blocks than top tier perimeter defenders, even if it might not show up on raw impact stats for some of the non absolute top tier DPOY type bigs), but you get the point.
End of Lebron stuff
The box score prior is where a lot of the separation itself happens between these metrics. Its actually where people do unique stuff, but overall I think of an All in One as an estimation and some treat it as the asnwer. I don’t know how good my metric is or how the final version will be (I’ll show the results of my retrodiction testing before I put it down below, it actually performed super well, in and out of sample, but I still have alot to work on it I literally started doing this 5 days ago and 2 of those days I was out and about.) But regardless of how good this metric ends up being I don’t think I’ll ever phrase a result like “Player X was a 7 player in impact because my number said so” because all it means is my estimation puts them here, 0% chance I agree with any of these metrics exactly I mean this one (Spoiler Alert) hates AD and as a huge AD fan quite literally 0% change in my opinion on that man lol. My estimation and EPM tend to not love AD while LEBRON has him around top 5, whereas Mine and EPM Love Bron and LEBRON has him at like 19th, its just how these things go sometimes.
Personally I think both of them are both easily top 10, and top 5 in the playoffs, (#1 and #2 this year btw with a Young Pat Riley with a Calculator Presence and Drip at the helm btw) but I live in LA (Although I’m willing to relocate for any WNBA or NBA team if I can’t get a return offer pls im desperate lmao look at all this I will literally work for on a fry cook salary to make up for the VISA lol).
Little side note, Also RAPM tends to run a bit differently on Python or R, I know J.E. RAPM and the Ryan Davis LA RAPM is very different from the one on BBI, BBI recently has done more complex stuff with their 3 point shooting luck adjustments from what I know (I know some people love and some people hate it, not gonna get into that yet). But i honestly feel like its a bit weird to see alreadly luck adjusted O-RAPM from Ryan Davis have Jokic as clear #1 and Giannis around 5-6 ish over the last 2 years, and then BBI O-LA_RAPM has Giannis as like country mile 1st on offense and Jokic is like 3rd and 6th.
This isn’t to say “NYEHEHEHE They did it Wrong!", its just the biggest example for one jump I could think of, getting into which RAPM set has the most “errors” can be a dicey proposition, and im not opening that can of worms, but my main point is some of the changes seem too dramatic for a slight adjustment ON the luck adjustments to an already luck adjusted set, especially for something where testing I’ve seen (shown later) seemed to indicate those adjustments didn’t provide super significant improvements. At the very least, I think weighing the weight of the assumptions to be made vs the practical results if they cause this big jumps has to be considered. To be clear, LOVE the LEBRON metric and think it and EPM are relatively close and both the undisputed top right now.
SKIP HERE FOR THE METRIC ANALYSIS AND BREAKDOWN:
Now that that part is done its time to get into the “fun” part, whats my metric!
First, I would like to thank Nathan Hollenberg, Seth Partnow, and Benjamin Alamar. I didn’t talk to Mr.Partnow or Mr.Alamar about this metric or anything, but I got to talk to them a bit during the Vegas seminar (just about data as a whole) and they were just super smart and cool and insightful to listen to. I had a coffee chat with Mr.Hollenberg, I had a bit of a plan about the metric by then and he gave me some advice for how long RAPM should be and he gave me advice and I do think that the reassurance that I wasn’t just being insane and my thought process wasn’t absurd or anything was a big push of confidence I feel. The coffee chat was super cool and he was just a really nice guy and I learned a ton about how to approach all of this and it did kind of make me think hmm this might actually be a cool idea. Also helped me a ton at being better at my internship with the advice he gave me!
Would also like to thank (I feel werid calling him Cranjis and I learned his whole name by accident and there is ZERO percent chance im ever gonna reveal that information to anyone so Im going to say Tim), generally helped me out alot in terms of combining Data and Xs and Os stuff which i still think im probably better at than data stuff although doing this project over the past few days was pretty fun, and Jeremias Engelmann for being a fantastic resource through his postings on APBR, also for being the reason I found RAPM when I was 15 on his dropbox links lol.
Of course I have to thank Eli Horowitz because if I didn’t get this Sparks internship I probably would have had to shift gears by now, and the Sparks experience has been absolutely amazing and I’ve really fallen in love with the entire process of being in an analytics department. I could write a whole essay on how that’s been such life changer for me (Well unless im deported within the next 90 days) but this is already going to be crazy long lol.
So, the Results will be down below, but I would like to go into what is the value add of what I made based on what I said about all in ones previously. Like I said in my Linkedin post, this is a rough Draft, but overall it tested very well when I compared it to EPM and LEBRON. My testing on EPM and LEBRON was based on the methodology Krishna Narsu (LEBRON creator) did on twitter.
METRIC BREAKDOWN
So essentially, for now, my metric has 2 main innovations and a few other just small tweaks that I believe boost improvement, which I will split into the Box Score Component and the Impact Component
BOX SCORE COMPONENT
The Box Score Component uses typical of per 75 poss box score data (per 100 poss adjusted, per 36 minutes), and blends some Synergy data, and Tracking data.
Now, before you click away hearing the tracking data, I was fairly conservative. With tracking data, I came up with Points Saved at the Rim which I did just by doing the FG Differential with them contesting multiplied by Defended Attempts at the rim* 2, used Charges drawn (With 2015 and 2016 charges drawn coming from PBP stats instead of the wonderful NBA API). I also did Defensive Field Goals Attempted, which I thought of as being “How Often You were the closest defender.” Tracking there isn’t perfect, I can attest to that with the second spectrum stuff I still think its likely a barometer of activity, and it did help.
Offensively, Assists were replaced with Assist Points Created (So including free throws and 3pt shots too instead of just the raw assist totals), and Unassisted Field goals were a part of it as well
Synergy was only used here, and only for these two things. PLAYTYPEPointsAboveExpectation, and OVERALLPointsAboveExpectation. It essentially was a way to control for shot quality and shot diet in a sense, I got a players points, and subtracted “expected points” which was their playtype diet * league average PPP, if that makes sense. In that way, you can gauge how effecient players are based on their playtypes, or if they are finishing tough shots at a great that doesnt show up in the raw effeciency, Overall above expectation was just comparing it to overall halfcourt PPP (Not counting transition here, which in hindsight I probably should have done).
% of times Starting was something I used just as it or minutes is generally always a part of these things. I also added a component called OffInd and DefInd just from seeing PIPM priors, its just the teams offensive or defensive rating * the % of the minutes of the team the player played (So team played 4000 minutes, player plays 3000 minutes, net rating 10, 10* 3000/4000). might potentially change the latter or add an on off component (although then its not really a box prior anymore? Double counting impact???), just because I feel it might not capture a guy like Wemby and some good rim protectors on bad defensive rosters.
Main Innovations here is the Synergy thing which I think was honestly a big help (Particularly PlaytypePointsAboveExpecation, Overall is more wtv), and the tweaks would just be some pretty conservative usage of tracking data.
I think the Offensive Prior is quite solid, and Offense in particular the metric tested really well, I don’t love the results for defense, its a first draft but I feel it undershoots some guys on bad defensive teams
IMPACT COMPONENT
I used an “Adjusted Time Decayed” RAPM, First, what is Time Decayed RAPM? Time Decayed RAPM basically is a fancy way to say that you add more weight (Give more emphasis) the more recent games, and you give less emphasis to earlier games, and you can do this for years. TO BE CLEAR, since this is a seasonal metric meant to be descriptive as well as predictive like EPM and LEBRON, the current season isnt time decayed, the time decay starts from September 1st, so essentially it weighs the selected season fully, and then weighs earlier games by how further back they are in the past compared to the Start of the “Selected” season.
To be clear, I made sure that the year samples were always 3 years total so the decay never extended beyond the 2 seasons before the current season.
Of course, this raises 2 questions.
Why is it done this way? Simple, to take into account the offseason and off season workouts and development. It makes sense, in my opinion, to say month 1 of the last season is much. more important than the last month of the season before that to showcase the current year and take into account offseason development.
Why Would you do this?
This is likely where there might be some more pushback, because why would I use previous year data? Its simple, in practice, Time Decayed and multi year RAPM with less weight on earlier years, generally are similar to Prior Informed RAPM. Prior Informed RAPM is RAPM when the previous year RAPM is the Prior (You yelling at your friend) instead of being 0 . These results generally look much better than Raw single year RAPM, especially in the noise category.
J.E’s datasets of RAPM come from the end of the playoffs, but since this is a regular season metric I found a pastebin post of his 2014 NPI RAPM around the end of the regular season
https://pastebin.com/gT2aN0P5 - Yes, thats Miami Lebron at 36th lol. This was posted by J.E somewhere (No idea where tbh), whose like RAPM god, so it was done right. the Dropbox links on APBR, have playoff data so a larger sample, but in the PI Dropbox Lebron is 1st (compared to 20th in NPI). In general its just much more stable. I’m cherry picking a bit here, but you get the point.
Time Decayed RAPM is also far more predictive than single year RAPM, regardless of if you run it raw or do luck adjustments like BBI likes to do and Nate Walker did
Beyond that, I did very light luck adjustments (Cue the Booing) Luck adjustments are weird, some people love them (BBI, Ryan Davis), some people hate them (J.E), I don’t really know where I stand. On one hand, something like free throws I get, Otoh, stuff like the turnover/OREB luck adjustments might be a bit of a stretch. 3 point luck adjustments are the big controversial one though.
Offense I think everyone would agree you definitely can have an impact on your teammates shooting 3s even if there is some noise there, Defense is more the thing where its like, yeah there are statistical tests showing its mostly noise and if you’re talking individual players (not teams) I can buy in general player’s dont have a huge impact there. At the same time, 3 point defense on a team level clearly does exist, I do believe that its probably likely that there are at the very least individual seasons impact can partially show up through lowered opponent 3 point percentage, regardless of if that trend holds constant for that player year to year. Its similar to the idea of shot tracking defense on jump shots, its certainly mostly noise, but conceptually there is a clear difference between Trae Young closing out on a three by KD vs Herb Jones closing out on a three, even if their three point defensive FG% might not be super different. Its one of those slippery slopes where its like, ok what do we do about midrange jump shots then? and etc etc etc. At that point it might just be better to leave it be.
So I just went very conservative with it. FTs were fully adjusted, Ryan Davis did a 50% luck adjustment on threes back when the nbarapm site talked about it more, while BBI I think does a 50% one on offense and a 100% on defense. I did a 20% one on offense and a 40% one on defense, for threes, which likely didn’t do that much in either direction to be honest. In TDRAPM I ran in the WNBA it very slightly improved prediction but not in any sort of practical or honestly distinguishable sense where it makes up for the controversy of the assumptions in the first place. BBI does a more complex one based on research though, so theyre stamped I think, I know J.E hates it based off his APBR and he’s like RAPM god so I just decided to do this minor one that probably doesnt do anything. If anything i’d take out the offensive one but the defensive one I think is fair, at least a small one, although I see the argument against it.
Free Throws were fully adjusted because as much as doing the Hanamichi “youre gonna miss” strat worked for me in highschool I don’t think it works in the Professional leagues lol. Offensive Rebounds are only treated as a new possession if there is a lineup change, otherwise its treated as a continuation, while the expected points are added regardless. I could see the argument for only doing this for the first free throw, or only doing it if the second free throw was not rebounded by the offense, but I dont really mind giving it value even on the offensive boards, it definately isnt going to swing the fence one way or another and honestly I kind of like rewarding offensive rebounders, box score priors likely undershoot bigs a tad offensively imo because typically the top tier impact guys are guards and wings offensively (Jokic is an anomoly of course), and while that is of course the pattern, generally theyre likely slightly undershot offensively like perimeter players can be undershot defensively, taken as a whole. I feel only doing it on the second FT when there is a miss and a DREB is unfairly punishing OREB on ft misses, so I felt this was the best of both worlds.
I would probably argue that Time Decayed RAPM provides a more accurate look at the current season by incorporating the previous seasons for more information, think about a guy on how good a player is watching this year and last year, versus only watching this year. He knows to weight games less, but it will give a better photo (Lets say he’s new to basketball and he only can watch 10 games to replicate how noisy RAPM can be). However, fundamentally and in principle there are real issues and concerns here, which I will address
I’ll run a non-luck adjusted version at some point, but I have an interview in like an hour (Typing this right before I post it rereading it) and want to get this out so I can point to it lol
How do these things fit together
Now, I do agree that it is a clear concern that Time Decayed RAPM does take into account the previous year, and this is where Box Score Component comes in. The Box Score Component artificially reduces that past year Bias a bit because it only takes stats for the current year, its like in the UFC where your falling one way and they hit you back the other way, I guess.
But also, with regards to the Box Score Priors creating Bias, this type of methodology can also help with that for that too. You can kind of “Set” How much the model is going to listen to the priors (How much your friend is listening to your ratings), and based on the numbers I’ve seen, EPM and Lebron have a pretty strict number there. You can basically set how far the observations are likely to be away from the priors, or the kind of typical margin of error for your priors (Which is the box score evaluation for how good player X is) compared to the players “True Value.” It doesnt hard stuck it or anything though, I set it pretty close (with how the results are I said it would be 1 away on each end), but likely not nearly as close as EPM or LEBRON set it to, and it totally gave values way further than that at times, which is kind of the point.
I get more into why I think its not a huge concern that the previous year is accounted for, but Also, Shai is like, 2nd here, so its clearly not a nail in the coffin deal breaker but I’ll get into it more
So some positives and negatives, I’ll get into the big picture of what this does, how it mitigates some issues with All in Ones, and evaluating some concerns
Benefits
-1. Larger Set of possessions allows for more freedom around the Box Score Weight, helps get rid of that Bias when its incorrect, likely especially important for role players or “unconventional” impact type players.
-2. Decay from Start of the Current season means you get the current season fully weighted, and the previous seasons have less weight. Functionally/Practically TD RAPM does better on the current year than NPI RAPM, and this equates to combining Prior Informed RAPM (Which generally = better, or at least more stable results than NPI RAPM for the current year). I also just had a fairly strong decay rate
-3. Creating a Box Score Component only based on the current year allows you to shift more weight towards the current year too, further reducing the concern that the last year is a component in this
-4 A less Stringent needing Box Score Component meant that I had more freedom to explore trying to find a box score prior that could capture more unique connections and not neccesarily to a perfect job at ensuring the top guys were stable since the sample helped that (To rephrase this to sound less red flaggy, It got to have some more freedom, in general where you have to absolutely NAIL the top guys being super high no matter what so there’s alot of reliance on the box scores making it pass the sniff test which aligns with but doesnt always = accuracy, so there was potentially more freedom to capture more connections with the regression because longer samples = less noise = nicer sniffs, I still heavily focused on it to be used to stabilize the top guys though which does generally align with the goals, since, well, theyre the top.)
-5 The Box Score component itself uses some novel features with tracking that I believe only EPM might use, including some derived off tracking data (without being anything too crazy, Points Saved at the rim is similar to Assist points created in a way), but the Synergy Points Above Expectation probably is the coolest and most novel idea.
--6 its almost like a fusion between an all in one and Prior Informed RAPM instead of NPI RAPM and a box score, think theres a mutual enhancement there somewhere where they both help each other
Negatives
-1. I do think that there are individual cases where the fact that the prior year is a factor may hurt, while most times in can capture growing starts
In terms of mitigating those worries though, I did a brief “analysis” (I just got the rankings of the MIP from 2016-2024 lol) for LEBRON, EPM, and Mine
To be clear, this isn’t to say green = good in the sense the higher on these guys the better (well I guess kind of but that wasn’t the point of this lol), its more to show the point the last year bias really isnt that huge of an issue, as a whole it was higher than LEBRON on these MIP guys (As in thought these guys were better than LEBRON did) and tad lower than EPM , but fundamentally, you would expect it to be far lower on these guys than the other metrics if this was giant glaring issue to capture sharp improvements between seasons. However, there may be some results that are wonky from that if there are extremes, although technically MIP should be the most extreme you can go (Of course from a raw impact perspective maybe some jumps are higher or maybe these guys were climbing and had “silent” high impact beforehand too, but overall I think it shows its not a “nail in the coffin” issue at all.
METRIC TESTING
So how did it test? and how did I test it? The way I tested it was through “retrodiction testing” which sounds super confusing but essentially is the same way EPM and LEBRON creators tested metrics amongst each other. Twitter thread is here https://x.com/knarsu3/status/1763321501766627328
(You can see old, new and regular LA vs non LA RAPM. LA = luck adjusted)
Basically, you get a Player’s All in One data for Year X (Lets say 2022), * by minutes played in Year X+1 (2023), sum up all teams this way by players, and then get the correlations to wins. EPM creator did it by predicting net rating IIRC but this was easier and quicker to do and seemed like something more explainable. (it was just faster to do and I could check with these to see if I made a huge mistake somewhere just incase)
I did essentially the same thing, only thing I changed was for rookies I gave them a -1.5 instead of replacement values (-2.5), and of course I used actual minutes because I dont have Kevin Pelton Projected minutes with me. also I think if a player didn’t play the previous season I gave them their season before that value if they played over 1000 minutes that year, mainly for KD and Curry.
So Below I have the R^2 for every Season, which is how well they explain the variance (For the sake of this, if you’re unfamiliar with it view it as a “Score”). from 2016 to 2024. 2022 to 2024 would be the out of sample years for my metric. (The Box Score part was trained on 2015-2021 Data). Overall, decent results. LEBRON does some really cool stuff with padding for low sample players, and its barely below EPM with that being somewhat taken away from it, so even though there’s alot of red in that column that is probably why
of the 8 years in the dataset, My metric First place in 5/9 years, including 2/3 out of sample years. Its only last place finish was 2021, which was the year after the bubble (NOTE: I did not change the decay rate for that year to account for that. not sure if it makes too much of a difference that it weighs those 8 games for some teams heavily, but still, could be a factor), and the Overall R^2 was a good deal better at 0.683.
If you look on the twitter thread with some of the different metrics for reference, a 0.03 gap in R^2 seemingly pretty decent among these metrics, and my numbers on EPM and LEBRON mostly aligned with his in his testing. (Multi Year Predictive metrics on his twitter thread, was getting years of that metric, like LEBRON 2020, LEBRON 2019, Lebron 2018, and using that to project LEBRON 2021, and LEBRON 2021 would be the number, + adjusting for age), not nearly as good as Predictive EPM and not as good as Predictive LazyLebron (They had a LEBRON metric using tracking data that tested well but some players were funky according according to him on twitter so they don’t really use it or like it).
A multi year version of this metric with the same methodology once it is complete could be interesting.
Below are the results, color coded by which year did the best. Overall, “MAMBA” did very well, and was also very consistent and did great in its out of sample years.
Now As Cool as it would be to be able to replicate this:
I will make this very clear. This is a first Draft of a metric. More than that, I do have much more appreciation towards what goes into making All in Ones and that there is a balance between “Predictive accuracy” and “Players have to make sense.” To be fully transparent, here are the general things I think need to be improved on:
I Think the defensive priors likely are not great even as a reference point. I think weighting the Team Defense might be a bad call, a guy like Wemby should be higher on defense IMO, incorporating ON-OFF in some sort of way to mitigate this(Obviously not in its raw form) seems like cheating? Still unsure here.
I wanted to make sure that strong perimeter defenders were represented well, but I do think that maybe more emphasis on bigs would help mitigate some issues. Maybe splitting defense into groups? Not sure if by height or by position or some statistical factor would work better because there will always be guys put in the wrong ones, and I dont want to run a KNN or something for this that sounds dumb lol
It gets AD Wrong, AD should be better. Giannis too, I think it fails to capture a certain archetype of defender, or maybe I should incorporate blocks into the Rim points saved category somehow, because right now adding them both together as separate predictors creates some wonky out of sample predictions because multicollinearity shenanigans, at least defensively
Offensive Priors are good and correlated extremely well to OFF RTG (Defense was about the same as the other two, Offense was crazy good if I recall), But while synergy play type over expectations can take into account shot and play quality at certain things in a way, I want some emphasis than I have on players who finish high value opportunities at a great rate too like AD
It will generally struggle to Identify really good players on teams that can be elite in the regular season without them. KD on the Warriors and Kawhi on the Raptors, becuase it is less reliant on box scores than some other all in ones (I guess thats somewhat of a niche it fills lol), certain players like that might be undervalues. this kinda is true for all metrics though
I’ve played around with 4 separate “Weightings” on the box score so far (By weightings i mean telling the model how much to listen) and so far the closer has generally. been the better (Like literally the one Im going to put below is tabbed as “Closest”). I'‘ll see if that trend continues
So some general thoughts there. I’ll go through some of the weird overall results but the dataframe will be below. No one will listen to this but 1. Its regular season only so Lebron is a bit undersold some years, 2. its only available 2015 onwards, and 3. its not meant to be compared across different years, although practically I guess its fine.Some Weird Results
Gonna break down some weird results on here and show if they’re unique to mine or in alot of them. Not making any conclusions from if theyre in all of them this is more just demonstrating some “weird” results are universal, and some ones are just in mine
2015:
Lebron at 6, george hill at 7.
(EPM Lebron at 5, George Hill at 7, )
(LEBRON, lebron at 4, george hill at 15. )
Note(2015 to 2017 was interesting because Lebron shot up the less I weighed the box score, so kind of the “This type of stuff undersells him” vibe, he was overall #1 taking the 3 years together in the impact part of it by alot (Curry 15-17 was the #2 stretch from 2014 to 2024 I think too, or something like that)
2016:
Lebron at 4
(EPM 4)
(LEBRON 2)
2017:
(Durant at 8 )
(EPM 14, )
(LEBRON 9) Should be higher of course, but obviously its universal here, mostly from how good the team coulld still be without him
2018:
AD 15, KD 16
(EPM AD 3, KD 15,)
(LEBRON AD 8, KD 12). I think AD should be 1 or 2 personally this year, Yeah my thing just sucks at getting AD right. All of them hate KD again
2019:
Kawhi 15,
(EPM Kawhi 15,)
(LEBRON Kawhi 12), Player of the year of course, its just low on him because toronto did well without him playing sometimes and its an impact thing
2020:
Kemba 8
(EPM 43,)
(LEBRON 23) Listen, if EPM is allowed to get Nurk and Zubac top 10 outta no where, I get to have Kemba lol. LEBRON honestly does a great job at not having random guys way too high sometimes, although maybe its more certain guys being low sometimes I hear people complain about
2022:
Luka 18
(LEBRON 8),
(EPM 17), Mostly low on his defense, but yeah obviously Luka was higher than all of this if im remembering the year right
2023:
Luka 11,
(LEBRON 7,
(EPM 7)
2024:
Giannis 7,
EPM 4,
LEBRON 2: This was just bad imo.
And AD being off in alot of these, It has jokic way too low on 2021 but has him #1 every year since
Now obviously I’m literally going through my list to see dumb stuff that pops out, probably missed some stuff, but you could go through other lists and do the same for example (I think 2024 Curry is like 25th in LEBRON? 12th on mine and on EPM, but LEBRON generally I think looks really solid at the top for sure) The point isnt to disparage anyone or any number, but to say these things will always have some individuals that make you go ???, as long as its not completely absurd, as in random guy X at 1,2,3, I think its reasonable
I think All in ones are fantastic tools but they aren’t like a “how good is this guy” metric, a guy being ranked way lower than expected on a team that functions well without him isn’t necessarily a bad sign on that player, because impact comes just as much from “They get way better when you’re there” To “They Suck when you sit”…
There’s a bit of a tradeoff between going super predictive vs accuracy unless you go for those multiple years of metric predictive versions, I think, LAZYLEBRON was something made where it predicted a tad better than EPM and LEBRON, but it had like, Steven Adams, Caruso, Delon Wright and Capela all top 10 in 2022 so it just wasn’t as practical so it never got released (All this info is on twitter btw), so in that context I dont think the wierd results for some of mine are too rough considering the accuracy seemingly being a bit better. if the testing and everything was all good.
Also quick note: I have made a version for the WNBA that obviously I’m not going to post publically, I made that before this actually, without tracking data since that doesn’t exist in a large enough sample, and without the Synergy Points Over Expectation as there isn’t an API for that that I have access too, but honestly I might just click download as csv like 100 times, but honestly compared to things like positive residual and SPI like it already clears, LEBRON for the WNBA is definitely better because of the padding things they use, I do plan on learning all that stuff though this was pretty fun
FINALLY THE IMPACT METRIC
WEBSITE FOR INTERACTIVE TABLE (Preview Below) https://timotaij.github.io/LepookTable/
https://docs.google.com/spreadsheets/d/1ZMR47Z8MDX9Tt7oQy5p5vzkwLznt9ROc/edit?gid=147787302#gid=147787302 < Spreadsheet Format
NOTE: IIRC, 0 might not have been the average for defense
NOTE: Players who played under 200 minutes in a season may not be shown correctly, but that was not a problem for the metric testing
So what does this mean? Did I create some new super metric or whatever that towers over the competition?
NO
Testing and Out of sample testing is cool and all, but at the end of the day it isn’t the same as legitimate real world results after it was made. Now to be clear, this isn’t a case where I kept building the model running it over and over again until I got good correlations, This is very much the first run (Or at least the first batch) and all of them performed relatively similarly, with the ones “weighted” more towards box score doing better (. Note: I did not take this to its logical conclusion, I did not weight box scores more once I saw they tested better the more I weighted it, and I do plan to do that but it just takes a long time to run, and I want to focus on actually making the box scores priors better, defensively especially
When I ran correlations with Offensive RTG, Defensive RTG, and Net RTG, (which is kind of Wonky to use instead of RMSE I guess but it was just faster), it did have some clearance offensively but was a good deal worse than EPM and about where LEBRON was defensively.
Offensively I do quite like it at this point, I think the TD RAPM aspect helps both but also the Synergy Play Type above expectations aspect I think actually is a pretty strong innovation here., but I should improve the box score component still.
For the Defense, I’m pretty dissapointed with my results. LEBRON results are a bit unfairly represented for reasons before + Its weighting of bigs being higher is likely more practical in the sense of an actual evaluation vs directly measuring their next year impact, I think that its practically far more useful than mine defensively and more comparable to EPM, I think LEBRON will miss individually on certain bigs more and less and undershoot some standout perimeter players defensively, and even if it is by design I do somewhat disagree with it, but thats a personal grip more than an objective one and there is a ton of practical value in the way it evaluates bigs and if anything comparing them amongst themselves bigs amongst bigs solves most of the issues. For the record, it was very slightly better than D-LEBRON, but given the low minute sample size padding LEBRON would likely clear defensively I would assume (Although you could maybe make the argument that time decayed RAPM is a very strong way to handle alot of low sample guys indirectly that wouldn’t be represented on this test?)
With that tradeoff of accuracy and “Sniff test at the top” I certainly wouldn’t say that I the top 10 of mine year to year look the best, but given the predictive accuracy is as strong as it is, the fact that they are comparable is honestly a pretty decent in my opinon, When I set out to do this, my main goal was to try to improve on the defensive side, so to see it didn’t really do that is somewhat disheartening, although given this is just a first draft its not too surprising either.
While I do think this is at least a reasonable All in One that is at least competetive with LEBRON and EPM assuming I didn’t have some sort of awful error testing it I wouldn’t take the testing at face value to say anything drastic, and id probably just say tentitively it might be a interesting alternative or new kid on the block in its current state
I think the key thing is while I think it already seem’s pretty solid and is presentable, I think it also best serves as a proof of concept almost. I think that LEBRON and EPM, for what they set out to do, are essentially optimal metrics given their respective innovations. LEBRON does a ton of really cool things I think, between their luck stuff and the role for low sample stuff, I think that the luck stuff I can get why some people have some concerns over that but if it helps their model it helps, although I do wonder if they run RAPM on R instead of Python when I saw their luck adjusted RAPM stuff on their website. I don’t know quite as much about EPM but he uses tracking data, and im basically sure he’s box score components are likely the best of the bunch all around. I know I’ve heard some people worry about some aspects of tracking data can be noisy at times, at the same time he’s literally a former NBA Head of analytics lol there’s 0% chance hes including something that hurts it especially with how good EPM looks for something that AFAIK . I actually learned about 10 seconds ago clicking on a different tab it does padding as well for its prior stats. Krishna Narsu was an analytics consultant for the Mavs and obviously is a crazy smart dude being alot of the data behind BBI. Like those things are like very fine tuned over the course of a much longer time than I’ve tuned this metric, most of the time was parsing stuff out with other commitments I’ve had, realistically I’d want to spend a week or two on the priors and a week or two on the weighing of Sigmas vs Strength of the Decay rate and then a week to bring those things together coherently, so far its been a day on the priors and a day on the Sigmas and about 3-4 days between collecting the data, organizing it, dealing with WIFI issues, and writing this up.
While those are essentially the optimized versions of what they are trying to do, I don’t believe mine is currently there yet (Or really close). I would say I’ve put some work into this, but at the end of the day this is pretty much day 5 of really grinding for this although I had prepped some things indirectly because I’ve ran this before in the WNBA just with being a bit busy lately. This is just a draft, but I think it served as a strong proof of concept for this type of framework for a metric.
Beyond that, looking into the validity of High Decay Rates could be nice for midseason projections as a high decay rate remaining at the same level might mean theres a level of sample where it levels off, or that mid season it might be a stronger predictive metric than other ones if the sample required to truly stabilize at a level rate with these priors isnt that large, or comparing midseason numbers there. But Fundamentally, gotta improve the box score priors as thats the area with the most room for improvement I think, particularly on defense. Offensively I think the POE stuff might actually be a really strong innovation there offensively that that one is in a good place (not to say its not to be improved upon, but happier with the state that one is at).
Probably nerding out a bit right now, but I guess my main point is I do think there are pretty interesting applications for this and theres some room for creativity here potentially, but for now this is more of a very strong framework that isnt yet optimized compared to a framework that has been optimized like some of the other ones, so excited to do get back into it when I can.
Now, my internship is coming to an end and that is VERY much my focus right now, I just had some free time recently and got some work done early so this was a nice little project I had in mind for awhile that I could finally get done, so I’m not sure how much time I have to really finish this up in the near future, so here’s draft one Day 5 I guess
Obviously this wasn’t the most formal post, but yeah any questions comments concerns or if you just wanna reach out. timothycwijaya@gmail.com, Timotaij on instagram, Teemohoops on twitter, and my linkedin of Timothy Wijaya are probably the best places to reach me.
Note: All in Ones arent a ranking of how good players are
Note: Caveats over All in Ones on a more philosophical standpoint are beyond the scope of this post, but thats a very interesting discussion
Note: This list does not represent how I would rank players
NOTE: As I said, this is a first draft of a metric.
Im sure literally every team has a better version of this type of all in one stuff
Cant remember if I already mentioned it, but the WNBA version of this (Without synergy playtypes, no API but ill do that manually I had it before) isn’t something I can share publically (and if it is you should feel bad for me because that means I took an L) but as a whole it worked pretty well, but that one is gonna have to have alot more testing with smaller samples. It tested very well in comparison to other metrics out there, but I think my gut feeling is something like WNBA LEBRON, which tested essentially at the same level, is likely better right now because the ability to handle low sample players with padding and stuff is important and I haven’t implemented that yet, + the WNBA in individual play can be more volatile and I should explore in depth what that means for it. That being said, compared to the other ones out there for the WNBA…. I’ll say that Lebron was very, very comparable and is a very good metric in the WNBA, no comment on the other impact stuff I tested.