TIMOTAIJ TJ 2/4/25 TIMOTAIJ TJ 2/4/25

MAMBA 2024-25

Im posting this from my google docs draft that was too long

I was curious how it would hold up compared to last time in the win projections contest, but also in terms of practically how it did versus vegas, in real world testing.

For a reminder of what I did to measure its accuracy in the “real world”, i came up with a super simple win projections from it and posted it to the APBR win projections contest before the season. This was a very simple projection as I really wanted to just test the metric, no aging curves, no preseason adjustment, and it was quite literally only lm(cumalativeMamba ~ Wins, data = mambaData). Along with not including aging curves (so my projections would not take into account if players improve or decline, so it assumed year 2 wemby = year 1 wemby), I also learned after entering many people in the contest (and in general for these projections) regress results to 41 wins somewhat, especially some of the metric projections in the contest which likely artificially inflated their results when I last posted (This only applies to a few entries in the contest. With the trade deadline about to end I wanted to see where it was at, at this point, as obviously i wound’t incorporate things like possible trades
It was joint tied for first in win projections last time, and since then has done better. It has gone from tied for first, to basically in a tier of its own in first place. The gap in accuracy (Measured by MAE) between this and 2nd place, is equal to the gap between 2nd place and 10th place. 2nd place is, by the way, an average of all the predictions rather than an actual entry. It is still outpacing the multiyear proprietary predictive versions of LEBRON and EPM from BBI as well in overall projections, although they both also beat out vegas and all three had similar success against the spread
In terms of practicality, the projections right now are 21-9 against the vegas spread. If i had incorporated aging curves, it likely would better than this (and it also bet the “Over” for the mavericks and 76ers, who have had horrible injury luck so far). I was able to see DARKO’s projections vs the spread as well, DARKO is the leading all in one solely projective metric in the NBA, and it is 16-14 against the spread. Its average error is slightly worse and RMSE is better (as results are more regressed to 41 wins in most projections than mine given the nature of how I did it). Note that DARKO has preseason adjustments built in with in, aging curves, and complex projected data used as well. (I want to look into doing the last one at some point)
Since Synergy data now is private and I have to use the NBA API for it, a few very low volume players might have somewhat inaccurate priors. By end of the season most people like this are likely under the “replacement level” minute mark cutoff, but it is pretty annoying for midseason projections, so a few strange offensive players might be high up.
It isn’t necessarily inconsequential to get them out midseason as since RAPM type data is way more noisy i personally think the weight you put on the prior and things like that are likely different or should be fluid. I was frankly just trying to get it out in a day, and will likely maintain it weekly and try to edit it along as I go, but while I thought the results looked off at first I saw there was just similarity to LEBRON and EPM in some players being rated higher or lower than expected, but as a whole I definitely would say the whole year is more reliable than any midseason data for these kinds of metrics in general

Despite it performing extremely well compared to other public leading metrics and vegas consistently and doing even better than expected in real world testing, and the results (or at least by the season end) looking generally coherent for something that tests well, I still consider this more of a proof of concept as the “RAPM” portion of the model was pretty vanilla compared to some of the fancy stuff some people include like rubber band adjustments and all of that, although im not sure if that necessarily improves it in the context of an all in one or if I will implement that or just make a online RAPM tool or something instead. Despite it ranking really well I think its an open question whether it is overrating certain archetypes or how it ranks role players in general compared to top end guys, but my main goal was to see if I could make a genuinely good all in one metric that could outdo the public ones and overall the results were quite good! Depending on what happens the next few weeks I’ll probably transition to more projective data stuff or tools with regards to basketball instead, as honestly even though I think this is at the very least on par (or the full season versions at least) with the metrics on the public right now, I think within the context of creating a real “ultimate impact metric” you probably need a really multilayered approach versus a one size fits all one that the ones in the sphere use right now, and i just don’t have that data with me

Also a strange comment someone told me about:

So before, when I reworked the metric, I said I did it to improve and because I saw some issues in the metric (I think). This is still true, I do not think the original version was good, at least compared to the current one, and had some pretty blatant misses here and there and the Prior itself was not as good. Then I got sent this message

“> The creator of MAMBA specifically said he thought the original version underrated LeBron’s post-Miami defense, and then created a revised version that improved LeBron’s standing, while noting that changes were made because LeBron “w[as] underrated in the Prior” (albeit that particular comment came specifically in a bullet point about offense, but it certainly further indicates the sentiment about how the prior was revised). Improving LeBron’s standing of course wasn’t the only reason changes were made, but I don’t think anyone could fairly read through both lengthy write ups the creator made and fail to come to the conclusion that improving LeBron’s standing was a major purpose of the revision. There’s no particular reason to prefer new MAMBA for these purposes compared to old MAMBA, except that the creator happened to like the output more (which, again, was obviously in part a result of it placing LeBron higher).”

Just wanted to respond to this because, I mean, it’s just a bit peculiar and kind of an annoying accusation? I guess I’ll go point by point:

The creator of MAMBA specifically said he thought the original version underrated LeBron’s post-Miami defense, and then created a revised version that improved LeBron’s standing, while noting that changes were made because LeBron “w[as] underrated in the Prior”

I just don’t really know how to react to this just because the idea that I “revised MAMBA because it clearly underrated Post-Miami Lebron’s Defense” is a bit of a peculiar point, I don’t really know how to respond to that because I don’t recall ever saying that? this message might be referring to how I thought all in one metrics like LEBRON generally underrated Lebron’s defense, and generally led me to my general criticisms of all in ones as players have variable non-box score impact. This was a general thought for multiple players, but I brought up Lebron becuase that was the conversation I had with someone that inspired the metric, and because like in the context of a write up using the name Lebron hits harder than Caruso

while noting that changes were made because LeBron “w[as] underrated in the Prior” (albeit that particular comment came specifically in a bullet point about offense, but it certainly further indicates the sentiment about how the prior was revised). Improving LeBron’s standing of course wasn’t the only reason changes were made, but I don’t think anyone could fairly read through both lengthy write ups the creator made and fail to come to the conclusion that improving LeBron’s standing was a major purpose of the revision. There’s no particular reason to prefer new MAMBA for these purposes compared to old MAMBA, except that the creator happened to like the output more (which, again, was obviously in part a result of it placing LeBron higher).”

This is just very peculiar, but I guess I’ll try to respond to it. The only real change I made on offense that impacted Lebron was an adjustment of how I used synergy data and adding in transition points over expectation, which ended up causing a huge boost in terms of accuracy. Outside of that, I mainly fixed some minor mistakes I had in the prior when compiling the box score data, and then did what EPM did in terms of setting certain limits to some data that could be extremely high or low based upon noise.

I mean, I think the name Lebron when referring the the player came up once outside of when I read out the results where I would be like: Yeah well now the guy that everyone knew was the best player in the world is now higher than George Hill so, i mean, its just a bit of a weird accusation

So to make something clear, I adjusted how I tested the metrics in terms of how I dealt with replacement players, so the testing metrics are different between posts (as you can see with EPM for example when you look at it). I had the Old MAMBA data when I was testing the projections, and the improvement between the old MAMBA data and the new MAMBA data was roughly the same as how much better old MAMBA tested compared to EPM when I did the projections the old way.

But I mean, I guess my response to this would be

I had issue in data and in the Priors, I fixed those issues, the metric tested much better, Lebron did better in a better metric.

My understanding is this was in some sort of argument about players, and its pretty funny seeing a number I made be brought up there when I have realized im just not really someone that ever markets anything so I kind of just have this as a “hey this is a cool thing I did” type of thing that “woah I made this crazy metric that beats vegas and everything” or whatever.

But like, the idea that I made a metric and then revised it to solely try to push an agenda for Lebron is kind of absurd? There were a ton of different weightings for priors and different priors I tried out, and there were certainly a large portion of them that tested very well, and better than the old one, where Lebron was 1st most years from 2015-2020, but for obvious reasons I picked the one that tested the best. Furthermore, I made this a RS only metric, which would be strange to push an agenda for a player who blatantly coasted in that timeframe for literally the entire time period this data covers. Honestly I have to think anyone that has tried running RAPM or general all in one metrics have a hard time imagining them being used as a case against Lebron, because his dominance in terms of on court impact really becomes more apparent when you try to make one of these things.

I recall I was curious to see how “playoff mode” bron looked in this type of data from 2015-2017, so I tried compiling that data and it truly was completely ludicrous, I don’t remember. It was fairly noisy, but I remember the constant was Lebron was something insane like a +14 or +15 in a data set where no one generally exceeded +12, so that was crazy, obviously playoff data is too noisy to post an all in one playoff metric or something like that, but him being that far ahead of everyone in the dataset by a country mile was pretty funny, most people’s results had been someone muted compared to their regular season results.

My reasoning for creating an all in one was I thought my approach could help somewhat offset the bias you get from priors in all in one metrics. I think it did so a little bit but certainly not completely and I think in hindsight a multilayered approach might make more sense than this, but it definitely proved to me that the bias there exists. It was somewhat shocking to see how much player’s would move the less I weighted the box score (and thus weighted box score impact more). You genuinely could get to points where Lebron was just first place all the way through the entire warriors cavs era aside from 2018 instead of Steph being the highest one in general, without even including playoffs, with a still reasonable weight on the box score but just relatively more weight than normal on on court impact for an all in one metric. Based upon it being such a consistent trend and the sheer magnitude of it, he’s very clearly someone underrated by this kind of thing. With how much of Curry’s impact comes from outside of the box score, I expected him to be somewhat underrated by the box score priors relative to his impact, but his priors were consistently fantastic with the ones I made even in years his raw RAPM impact were not as strong, so that was very interesting, to see he didn’t have that issue of the model potentially underrated him the same way.

But yeah, pretty weird message I got from someone, about me there. Not what I expected to be emailed, wonder where it was from.

https://timothywij123.shinyapps.io/MAMBAData/

TIMOTAIJ TJ 10/3/24 TIMOTAIJ TJ 10/3/24

MAMBA Reworked Updated:

A Fantasy end goal with this metric would be for it to be able to be pitched to teams and be seen on the same level as EPM or LEBRON. However, there was an issue with the original version, while the idea was to use Time Decayed RAPM rather than regular RAPM to create less bias and put less emphasis on the box score I think is still very interesting, it may reduce some practicality of the metric early or mid season as the box score component as it is may not be as powerful as Box LEBRON or EPM’s box score component.

Therefore, without losing sight of the philosophy behind creating the metric, I worked on creating a Box Prior that could stand on its own, so this metric could work midseason with a higher decay rate and still provide a good snapshot of the current season (Simply starting the decay rate out really high and lowering it as the season goes on). Perhaps if the Prior is good enough, I could provide a supplementary single year version as well.

Fundamentally, first I would need to have it be able to work with a relatively high decay rate, so Increased the decay rate heavily:

In the original metric, the decay rate was set in a way so that:

The last game of the previous season would be at 68%
The First game of the previous season would be at 40%
The last game of two seasons prior would be at 28%
The First game of two seasons prior would be at 17%
The Current Decay Rate is set up in this way:

The last game of the previous season would be at 59%
The First game of the previous season would be at 28%
The last game of two seasons prior would be at 17%
The First game of two seasons prior would be at 8%

This creates less bias coming from the previous year, As I do want this to be a single year metric with previous years to help stabilize it.

Heavily Reworked Prior

Originally, I had a belief that the box score prior wouldn’t be as important with the time decay RAPM as a factor. While this is still true, I also did not want to create a situation where I wasn’t creating a strong box score prior for the sake of relying on Time Decay RAPM. Furthermore, upon recreating the metric, this premise was not entirely accurate, as while the test results were still generally better than EPM and LEBRON, the results themselves would vary heavily depending on the Priors, often with results that simply did not pass the sniff test. I still want this to be a bit more impact driven, but upon thinking of the practicality of this metric in-season, I did want to create a powerful box score prior still, so the data is regressed a bit more heavily to the Priors, although I’d assume not by as much as the other All in ones in the sphere still.

Offense:

Took out POE. This was my original innovation at the time for the Prior, but I found that while it lead to an overall accuracy.
Added Transition POE, as players like Giannis and Lebron were underrated in the Prior
Some very limited interaction effects (As they can cause some very weird individual results, I was very conservative with this and set a limit for how much they could alter the original data), and some things like Transition POE were slightly shifted depending on a players Overall POE efficiency very conservatively.

Defense:

Charges Drawn was heavily inflating some bigs who drew many charges but weren’t great rim protectors, but it was a very powerful predictor. While there are likely more sophisticated ways to do this, simply setting arbitrary caps on charges drawn based on analysis of the dataset by position, ended up being a pretty solid way to do things. Bigs and bigger players were emphasized by other components anyways, so this helped balance it out to an extent.
Added Field Goals Missed Against, and added a small effect where (+0.25*blocks) were added to it. Note: I don’t actually believe this improved testing results at all, but the results generally made more sense + I did want to emphasize bigs in the box score prior.

Some fo those changes on defense may not have increased overall accuracy, but I did want to emphasize rim protectors more for basketball reasons, and within the framework of this metric I did believe it did help present players in a more “in a vacuum” type way, from the prior to do so, while also having things like Charges Drawn balance the overall picture out. I made other changes and testing as well, but this was was a brief summary of the big changes

Testing

Here, I will post the Correlations to Offense, Defense and Overall, for LebronBox, and MambaBox, and below that, Mamba and EPM. EPM Priors are not available, but LEBRON priors are, and I want to test the Prior specifically as well.

The testing was slightly different for the sake of time, and since I’m just comparing MAMBABOX to LEBRONBOX and MAMBA to EPM. Since the only thing I really cared about was comparative accuracy between metrics, Rookies were given a value of 0, players who played under 250 minutes in the previous season replacement value. When trying to actually predict with the metrics in the best way possible rookies should be given replacement level values, but with diminishing returns on accuracy as you get higher up this may demonstrate the differences a tad better.

So overall The process is the same as before but more simplified: Get current minutes, if they are under 250 give them replacement values, sum them up, and get R^2 vs wins. I did the same for relative Offensive, Defensive net ratings, and overall net ratings too. This will lead to generally lower R^2 all around than in my original test, but its just done because I’m not trying to get the highest prediction but see how MAMBA and MAMBABOX stacks up versus other metrics

Box Score Prior Evaluation:

Compared to BoxLebron, the Prior here shines more offensively. Defensively it’s about a tossup. 2022 is a glaring miss for BoxMAMBA, although given that this is shared by EPM’s overall numbers, maybe it's more a statement on tracking data that year. Outside of that, the overall average accuracy is slightly higher but nothing remotely meaningful, while LebronDefense wins out in 5/9 years.

Now, I should note: When Creating the Box Prior for Defense, I made it in a way to emphasize bigs more since their impact is a bit more stable in different situations, and just the basketball reasons of 2 elite rim protectors is fundamentally different than 2 elite perimeter defenders for example. This is similar to How LEBRON does it, but I will say that general accuracy improves if I don’t do that, on the defensive end and overall, but I do think in a vacuum it might be more accurate to do it this way in terms of ranking players as long as you’re incorporating stuff to balance it out for truly elite impact perimeter players so you don’t just have a list full of Bigs in the final metric, + I think this type of approach makes more sense in conjunction with this kind of approach to the impact side of things

As I know the person behind BBI I don’t really want to display the LEBRON results in this testing + it would be a tad unfair because they do a lot of cool stuff with padding low possession players which would not be represented, but overall gap in defensive prediction between the two metrics with this methodology of testing was pretty large.

Its overall performance, off a glance, seems similar or perhaps better relatively to BoxLazyLebron which I mentioned was an unreleased Prior for an unreleased metric called LazyLebron, which was discarded because of some spurious individual results at the top. (The final results themselves weren’t released, but it was noted Steven Adams, Caruso, Delon Wright and Clint Capela were all in the top 10 for 2022-23 as an example of an issue. This is in the final metric, not the box score prior).

I will show the MAMBA results in the same format as I did before, but generally that wasn’t much of an issue for my metric, I felt the current results “passed the sniff test” much better than the previous ones I published for example, although still there were a few caveats to that and exceptions.

Last note: I would likely approach the Defensive Prior a bit differently to be more precise if/when I do a single year version of this, I wasn’t necessarily trying to get the highest prediction accuracy as I could, as a few iteration and variables I excluded led to higher prediction accuracy in testing without changing results too drastically, I just felt with the TDRAPM still being a part of it being conservative with this made sense.

EPM BOX is not available, I would likely say that EPM BOX is probably better than this, as that also incorporates tracking data and EPM Defense always tested very well, outdoing the previous iteration of MAMBA’s defense by a decent margin. I would likely try a more precise approach with tracking data if/when I create a single year version, which I am more interested in now after seeing the performance of the Prior

Overall METRIC Testing

Now, because Rookies weren’t given values at all (Thus, would be 0), the actual accuracy results from here are going to be less accurate than had I given them replacement values, but to an extent I believe this might be better for demonstrating the predictive accuracy between metrics.

In general, I would say defensively it is about a tossup, but offensively and overall accuracy wise MAMBA seems to have an edge. The gap is larger that it was before, likely from slightly different methodology and more glaringly because rookies were given values of 0 instead of replacement values. The actual gap is likely smaller that it appears on here, but MAMBA still performs a good deal better regardless.

Actual Results Breakdown

In the original writeup, I went over some results I thought were weird, and got the corresponding results here. Here is what the new numbers say about these players.

2015: MAMBA: Lebron at 6, George hill at 7. (EPM Lebron at 5, George Hill at 7, ) (LEBRON: Lebron at 4, George hill at 15)
MambaNew: Lebron at 3, George Hill at 5 - George Hill jumping up a few spots is a bit odd

2016: MAMBA: Lebron at 4 (EPM 4) (LEBRON 2)
MambaNew: Lebron at 2
2017: MAMBA: Durant 8, Lebron 3 (EPM 14, Lebron 7 ) (LEBRON 9, Lebron 3)
MambaNew: Durant 9, Lebron 1
2018: MAMBA: AD 15, KD 16 (EPM: AD 3, KD 15,) (LEBRON AD 8, KD 12).
MambaNew: AD 9, KD 16
2019: MAMBA: Kawhi 15, AD 10 (EPM Kawhi 15, AD 5) (LEBRON Kawhi 12, AD 2), Player of the year of course, its just low on him because toronto did well without him playing sometimes and its an impact thing
MambaNew: Kawhi 17, AD 5,
2020: MAMBA: AD 10, (EPM AD 8) (LEBRON AD 5)
MambaNew: AD 6,
2022: MAMBA: Luka 18, (EPM Luka 17), (LEBRON Luka 8),
MambaNew: Luka 22,
2023: MAMBA: Luka 11, Giannis 12 AD 21, (LEBRON Luka 7, Giannis 2, AD 5), (EPM Luka 7, Giannis 9, AD 10)
MambaNew: Luka 12, Giannis 8, AD 17
2024: MAMBA: Giannis 7, EPM Giannis 4 ,AD 18, LEBRON Giannis 2: AD 6
MambaNew: AD 14, Giannis 6 (Note, Originally outside of the MVP candidates it was PG and Mitchell above him, now its only Bron and the MVP candidates, which seems a bit more reasonable).

Overall, as you would expect, some of the eye popping results that were pretty general across all in ones remained here. Outside of Luka, the differences are more towards the direction of what you would expect, and generally around the EPM range. Giannis jumped up a bit in 2023 and 2024, and instead of being behind PG and Donovan Mitchell outside of the MVP candidates hes behind Bron with some seperation versus everyone else.

Outside of that: It has Jokic as #1 Every year from 2022 to 2024, similar to LEBRON and not similar to EPM, but is lower on him in 2021; Its generally higher on Curry and Lebron, but the pretty obvious fix was AD is no longer severely underrated. While it isn’t necessarily high on him still, and I do think LEBRON is more accurate in this case, its more in line with EPM most years.

To Wrap up, the goal was to

Have results pass the sniff test a little more whlie maintaining predictive accuracy
Create a genuinely good Prior that can stand on its own, so this metric can be used in season rather than just at the end of seasons (Showcased through maintaining accuracy while increasing the Decay Rate)

Overall, I think I achieved that. The results maintain despite the decay rate being increased substantially, the Prior is tested now and does pretty well, and while the results aren’t incredibly different, for the most part the differences make more sense or are in line with some odd results from other metrics rather than being alone in that regard, although its REALLY high on kemba now. I think there is potential for this to be a single year metric too, but for now I think it is at a point where it can still do what its meant to do (reduce bias with less box score weight) but still usable in-season with a higher decay rate to without being too biased towards prior seasons to get a good image of the current season.
Here are the results. Seasons can be filtered by typing them in. https://timothywij123.shinyapps.io/MAMBAData/

TIMOTAIJ TJ 9/14/24 TIMOTAIJ TJ 9/14/24

My All in One Metric, MAMBA?

It all begins with an idea.

Originally, I had a very long section on the background of All in Ones and my opinions on them and some personal caveats I had with them that flowed into the justification for why I built this metric this way. I originally thought this overall blog post was 7 pages of text, to my horror I learned it was 30 pages and CTRL+A lied to me. I also learned the 2024 dataframe did not include rookies. Therefore, I will be keeping this much more brief, but the much more in depth sections explaining All in Ones, delving more into some reservations I have with them, Justifications for those reservations, and how that leads to the framework I gave here, are in that really lengthy version of the blogpost, which can be found at the bottom of this post.

Here is a very quick summary of that section, as it is important to understand what went behind the thought process of creating this. To pin this post, this will always be said it was published after any post I make, but this metric was made at the start of September

APM is the base form of RAPM (R stands for regularization, so this is without that), and tries to see how impactful Player X is, by seeing his impact on scoring margin, controlling for the 9 other players on the court. However, it struggles with multicollinearity, assigning the right credit to teammates who share the floor alot. To understand RAPM and Bayesian All in ones, think of it as a metaphor, of your really insightful friend watching basketball for the first time, and he’s just screaming out how good he thinks players are (Impact =/= goodness, but lets ignore that for now) by saying a number that he thinks represents that. but he just thinks everyone on the team is so great because they keep winning by 100.
Ridge Regression is typically what the “R” in RAPM is, and it involves shrinking those predictions to 0. So you can think about it, like as your friend is screaming out those numbers, you are here telling him every single player is actually a 0. That sounds a bit odd, but what ends up happening is as you keep doing this, he starts realizing who the standouts truly are; you “trolling” is going to potentially affect his opinion on many players, but while he might look at a random player and say “hmm my friend says he’s a 0 maybe i’m overating him” he’ll 2018 Lebron drop a million points and think “my friend is tripping” While this helps out a ton in practice, even though your friend is super smart, you only have a limited amount of games to show him so he might just not have enough film to truly parse out who’s good and bad, accurately at least. The other issue with RAPM can be sample size, one year RAPM is very noisy (as impact data is in that scale in general at times)
All in Ones, instead of saying everyone is a 0, you try to help him out by saying a number for each player that represents how good you think they are, this is the Box score component, SPM, or simply the prior. This has 2 effects, first, its likely intuitive how much better of a baseline it is to have some sort of measurement of how good each player is separately (So Jokic and Tristan Thompson have separate ratings you tell your friend, instead of saying they are all 0s and equal), and it ends up just being a nudge so your friend can get to the more accurate answers a bit faster
This makes All in Ones far superior to RAPM or APM, however, in my opinion while this helps alleviate issues with noise, it in turn creates bias. You are applying a linear model (more on why other types of priors don’t work well in the doc, just assume its linear and ignore XGboost and that stuff if you are aware of that stuff), to capture trends on every player. It naturally will under and overshoot on certain players because box scores dont tell you the whole story. This isn’t an issue in comparison to RAPM and APM overall as it does just end up being far, far more accurate, but at the same time, with some players (Lebron on defense post miami, likely KG), you can see that they are clearly hurt by all in ones (or at least relative to other superstars) much more because of this bias consistently when cross referencing with RAPM for players with “similar stature.” You create bias, in place of noise, massive improvement overall of course, but can cause some issues for some individuals consistently

METRIC BREAKDOWN
So essentially, for now, my metric has 2 main innovations and a few other just small tweaks that I believe boost improvement, which I will split into the Box Score Component and the Impact Component

BOX SCORE COMPONENT

The Box Score Component uses typical of per 75 poss box score data (per 100 poss adjusted, per 36 minutes), and blends some very conservative use of Synergy data, and Tracking data.
Assists were replaced with Assist points created, Blocks with Rim Points Saved (DFGA * DFPerc Diff * 2), and Unassisted FGM was used, Charges Drawn was used
Created a metric called SynergyPlayTypePOE, points above expectation based on a player’s play type distribution frequency and efficiency, in a way to account for shot quality or if a player was incredible at doing hard things. This was by far the biggest boost in the context of the box score priors.
General things, like % of games started, and team Off/Def RTG * % of team minutes a player played, which are implemented in other Priors like PIPM, were also used.
Offense was far better than defense, but the box score prior is certainly in its alpha stage, especially on defense.

IMPACT COMPONENT

Big change was here, I used an Adjusted Time Decayed RAPM here where the decay rate started before the start of the current season, and would not go beyond 2 seasons prior. (So current season, lets say 2024 would be weighed fully, and then the model would not even look at 2021 data). Time decayed just means you weight games less by how far into the past they are, this is done to take into account beyond just recent things like offseason work and improvement (or decline!).

Why do it this way, isn’t it better to just look at the current year for a current year metric? In practice, Time Decayed and multi-year RAPM with less weight on previous years, is similar to PIRAPM. PIRAPM is RAPM with previous years RAPM as priors (You yelling at your friend) instead of being 0 . These results generally look much better than Raw single year RAPM, especially in the noise category.

An example here is important. Here is a end of RS RAPM I found pastebin post of 2014 raw RAPM around the end of the regular season https://pastebin.com/gT2aN0P5 - Yes, that’s Miami Lebron at 36th. This was posted by J.E somewhere, who’s like RAPM god, so it was done right. PI RAPM uses the playoffs too, but Miami Lebron is now first. In general, Time Decayed RAPM is also far more predictive than single year RAPM, regardless of if you run it raw or do luck adjustments like BBI likes to do. It’s very comparable to All in Ones actually, maybe even favorably so even compared to the best ones.

Did luck adjustments on Free Throws (I consider FT OREBs a continuation of a possession if players on the court do not change, so it did not hurt these players), and an VERY minor one on 3 pointers with less of one on offense (Controversial of course, doing it without it would yield the same results as I did it with such low magnitude, I think Ryan Davis’s set was 50%, BBI is a bit higher i think, mine was like, 25% overall in hindsight likely didn’t change anything)

Fundamentally, though, there are some concerns here with Time Decayed RAPM and how this can create bias which I was worried about before as well, which are very valid. I won’t sit here and explain how these concerns are completely invalid, but I will try to justify that these aren’t as worrisome as one might think with evidence below. For reference, Shai is 2nd in 2024 “MAMBA”

How do these things fit together

The Box Score Component reduces that past year Bias because it only takes stats for the current year. It’s like a left hook as you’re falling from a right hook. To go into the benefits overall and alleviate obvious potential concerns. You can kind of set how much the model regresses to priors (How much your friend is listening to your numbers). TDRAPM creates a larger, more stable sample, meaning you can set this number to be less strict, and also create SPMs that can potentially capture less connections, whereas normally you have to absolutely NAIL the top guys. being super high so it’s passes the sniff test + for messaging. Really, you still have to do that to an extent, but there’s just less reliance there, if that makes sense. Essentially, it’s a mutual enhancement between the larger sample and the SPM helping each other out.

I do think that there are individual cases where the fact that the prior year is a factor may hurt, a huge concern would be if it can capture players who made big jumps. Therefore, I got every MIP and their ranking in MAMBA (that’s closer here) EPM and LEBRON.

MIP Rankings For all 3 Metrics

Note: Green = Lowest, Red = Highest Ranked
THIS DOES NOT MEAN GREEN IS GOOD AND RED IS BAD, JUST TO SHOW THIS DOES NOT UNDERSHOOT GUYS WHO IMPROVED A TON

To be clear, this isn’t to say green = good in the sense that its more accurate, but it is more to show that “MAMBA” does not have issues with players who have big jumps or changes in performance. (Perhaps it would be better to look at players who had massive RAPM changes, but that seemed a bit less practical and a less strong/clear message) In theory, MIP = biggest jump, of course in practice that may. not be the case.

Now, I won’t handwave this concern away, it absolutely creates a bias. The way I would say its a value-add, is that you alleviate the bias that the box score or SPM can create but create some bias with this previous year being weighted.

Last thing before I show the metric’s performance and accuracy in testing: Some will tout their all in one as the literal impact a player had, many of these people are far smarter and more qualified than me. even though I do think even in this proof of concept form this is generally at least somewhat competitive with the stronger All in Ones in the public sphere, here is my stance: Just like how we use (practically) Raw Impact as a player evaluation tool to estimate True impact, All in ones are all ESTIMATIONS of True Impact. I’m never going to say “Player X was a 7th impact player” because he was 7th in MAMBA, or any other all in one, because they are all estimations. EPM and LEBRON do the best in testing, (not including MAMBA :P), but they can vary by 100 spots or more for some players. And its not even necessary that any of them are right on an individual player, maybe they all miss the plot! That being said, if they all paint a clear picture on a player’s impact not matching your preconceived notion of them, that may be a sign that its something to look into, but it is NOT irrefutable proof that you are wrong.

METRIC TESTING

To test this, I incorporated something called Retrodiction testing, using methodology similar to how EPM and LEBRON creators tested these metrics. I did mine most similar to Krishna Narsu’s (LEBRON creator, although he used projected minutes and I used actual minutes, he did do actual minutes separately). Essentially, I got players “All in One Score”, Multiplied that by minutes next year, and then grouped players by team. Getting the overall sum gives you a “Team Score”, and then got the R^2 (Correlation Squared) to team wins the next year. R^2 = how well it explains variance, but just think about it like how well it can tell which teams are better than which for now (Don’t destroy me for explaining it like that stats people its just for simplicity :D). low minute players were given replacement player values , rookies were given a tad better than that but still strongly negative.

How did it perform? Keep in mind, this is a stage 1, proof of concept version. Most of these metrics have been tested and gone through lengthy development and testing before they were deployed, and subsequently editted and further improvede upon over the course of real world testing and among years, This was quite literally was from the first batch of results (I made 4 batches with the difference being how important I told the model to weigh the box scores) I made. In this overall process, if you take out the time I spent collecting data, running the model, testing the metrics among each other (as I did not test more than once), and the longest part of all, writing this, between the week I’ve spent doing this, genuinely only about two days was spent on the entire process of building the model itself, with very little fine tuning having been done.

With those caveats (excuses) out of the way, it performed the best by a very decent margin. It was the best in 5/9 years, including 2/3 years out of sample. The out of sample years especially it performed better, LEBRON had an average R^2 of 59.2, EPM one of 58.73, Mine had one of 64.8. The results I had overall were similar to the ones that were shown on Krishna Narsu’s twitter (For LEBRON and EPM), so I doubt any deviation in testing or large error occurred. (“it would be the, Compared to what Steve suggested” post).

It likely performed better than the unreleased metric, LAZYLEBRON as well based on the relative results, which was a BBI metric that was incredibly predictive but produced strange individual results (Like Steven Adams, Caruso, Delon Wright and Capela all top 10 in 2022) and therefore was unreleased. (all of that info is on his Twitter, that is where I got it from). A multi year version of this metric (Multi year in this context = getting MAMBA from 2020, 2021, and 2022, and using that to test 2023, and having that be the model) would be interesting, as that was the multi year predictive thing tested in the below twitter thread (It wasnt just LEBRON or EPM using multi year data, it was using different years of LEBRON or EPM values). The thread below has Krishna Narsu’s results testing the metrics. assuming the gaps are around the same, a 0.03 R^2 gap is at least seemingly notable.

https://x.com/knarsu3/status/1763321501766627328

Now As Cool as it would be to be able to replicate this:

I will make this very clear. This is a first Draft of a metric. More than that, I do have much more appreciation towards what goes into making All in Ones and that there is a balance between “Predictive accuracy” and “Players have to make sense.” To be fully transparent, here are the general things I think need to be improved on:

I want to have more emphasis on Bigs, Perhaps separate models? I think doing a KNN to get different groups of defenders sounds like something that sounds nice but won’t actually work practically with this kind of model
It likely undershoots AD and Giannis defensively, I want to incorporate Blocks into the Rim Saved metric somehow, as when having them as separate predictors multicollinearity shenanigans occur, and interaction effects create no-nos, but I can think of a few ways to incorporate it. (So blocks aren’t currently part of it, just rim points saved)
Synergy POE was a super good thing to add, it might undersell guys who are efficient because they create crazy good opportunities with their movement vs their team doing it, AD and Wemby provide unique value there. If I could separate Rolls and Pops that would be nice but I don’t have the synergy API.
It will generally struggle to Identify really good players on teams that can be elite in the regular season without them. KD on the Warriors and Kawhi on the Raptors, because it is less reliant on box scores than some other all in ones (I guess thats somewhat of a niche it fills lol), certain players like that might be undervalues. this kinda is true for all metrics though, but still
This is the first run, playing with different box score weighings and decay values is on the agenda, but I want to focus on the SPM portion of the metric, particular the defensive part, because that is certainly still in stage 1. Perhaps different box score weighings for defense and offense too?
Strange Individual Results
All in ones always have some weird results at an individual level, so before this into the trash because player X ranked a bit weirdly, I briefly went through some weird ones off a very cursory look and compared it to other metrics. If they all have a similar weird result, that isn’t to say that they are all right or anything, its to say that its just an All in One thing versus a flaw with MAMBA specifically, but certainly some players I think my metric will “get wrong” more than other ones, which applies to all of them of course. Here is a breakdown of some odd results and some general commentary.
2015: MAMBA: Lebron at 6, George hill at 7. (EPM Lebron at 5, George Hill at 7, ) (LEBRON: Lebron at 4, George hill at 15. )
Note(2015 to 2017 was interesting because Lebron shot up the less I weighed the box score, so kind of the “This type of stuff undersells him” vibe, he was overall #1 taking the 3 years together in the impact part of it by alot (Curry 15-17 was the #2 stretch from 2014 to 2024 I think too, or something like that. Also it gets AD very wrong, its very low on AD for some reason which I disagree with).
2016: MAMBA: Lebron at 4 (EPM 4) (LEBRON 2)
2017: MAMBA: Durant 8 (EPM 14, ) (LEBRON 9)
Obviously should be much higher, but mostly from the warriors doing well when he was off the court
2018: MAMBA: AD 15, KD 16 (EPM: AD 3, KD 15,) (LEBRON AD 8, KD 12).
I think AD should be top 5 with the RS he had, but yeah mine is insanely low on him for some reason, its a flaw I think
2019: MAMBA: Kawhi 15, (EPM Kawhi 15,) (LEBRON Kawhi 12), Player of the year of course, its just low on him because toronto did well without him playing sometimes and its an impact thing
I think AD should be top 5 with the RS he had, but yeah mine is insanely low on him for some reason throughout
2020: MAMBA: Kemba 8, (EPM Kemba 43,) (LEBRON Kemba 23)
This is a “what the hell result”, like EPM having Nurk or Zu top 10 in some years, LEBRON does a good job at not having really odd players top 10 consistently actually
2022: MAMBA: Luka 18, (EPM Luka 17), (LEBRON Luka 8),
All of these undershoot him but EPM and MAMBA absurdly so, LEBRON does the best here but obviously Luka is Luka
2023: MAMBA: Luka 11, (LEBRON Luka 7), (EPM Luka 7)
Same as Prior except mine stands out at undershooting him
2024: MAMBA: Giannis 7, EPM Giannis 4, LEBRON Giannis 2:
Relevant: MAMBA Defense: Giannis 1.2 (87th) EPM Defense: Giannis 1.8 (72nd) LEBRON Defense: Giannis 0.9 (64th)
EPM and LEBRON are both reasonable rankings, mine is not. Really obvious glaring issue with Mitchell above him on defense on mine, on EPM they are actually fairly close (Mitchell at +1.5, Giannis at +1.8) , in LEBRON they are further apart with Mitchell at 0.2 and Giannis at 0.9, but all of them are a bit lower on Giannis defensively this past year (which I don’t agree with, to be clear). That being said, I would say mine did a good job ranking his defense during his DPOY seasons and 2021. On EPM: 2019 24th, 2020 9th, 2021 64th. on LEBRON, 2019: 3rd, 2020: 7th, 2021: 5th. on MAMBA: 2019 8th, 2020: 2nd, 2021: 5th. EPM stands out as being a bit odd there, LEBRON and MAMBA both do similar jobs, with MAMBA probably undershooting his 2019 DPOY campaign and LEBRON undershooting his 2020 DPOY campaign. Knowing how his raw RAPM and impact data looked like during this stretch, I would argue the closer to 1 or 2 the better personally. Not sure what to make of this.
AD being off in a lot of these, It has Jokic way too low on 2021 at like 7, but has him #1 every year since by much more than the other metrics.
This might look bad, but I’m literally looking through to see things I find stupid about mine, you could likely do that for everything (I think 2024 Curry is like 25th in LEBRON? 12th on mine and on EPM, but LEBRON generally I think looks really solid at the top for sure) The point isn’t to disparage anyone, or any number, but the point is to say all of these metrics have some weird individual results, mine included. I absolutely think on many of these mine is just flat out completely missing, hopefully once I edit the box scores it will mitigate that issue. Pretty much all of these players (Aside from Kemba and Hill) I consider top 1,2,3 or 4 in those years.
I think All in ones are fantastic tools but they aren’t like a “how good is this guy” metric, a guy being ranked way lower than expected on a team that functions well without him isn’t necessarily a bad sign on that player, because impact comes just as much from “They get way better when you’re there” To “They Suck when you sit”
There is seemingly a tradeoff between how accurate/predictive a metric is and some really odd individual results. Based on the accuracy tests at face value, I think the tradeoff is worth it at least considering what seemingly happened with LAZYLEBRON’s results
WNBA version for this exists and it is not public because I am in an internship (and hopefully will become full time!). It tests better than anything available but for the WNBA specifically I much prefer LEBRON right now for the low sample padding.
In terms of not having glaring players at the top or having a top tier guy way too low, I think LEBRON does the best job there, off a quick glance
Im pretty fine saying the “misses” here are simply misses caused by bias or just noise clouding reality, many people within the analytics space might be against me having that opinion if all the all in ones agree on something, but that’s just my personal take in some situations where it just doesn’t pass the sniff test. I do think when those discrepancies exist, its worth looking into though, just that all in one results that really deviate from general opinion is more a potential signal than some sort of proven, irrefutable answer, if that makes sense.
FINALLY THE IMPACT METRIC

WEBSITE FOR INTERACTIVE TABLE (Preview Below) https://timotaij.github.io/LepookTable/

I posted an excel file below, but above is a more appealing/responsive viewing format.
https://docs.google.com/spreadsheets/d/1ZMR47Z8MDX9Tt7oQy5p5vzkwLznt9ROc/edit?gid=147787302#gid=147787302 < Spreadsheet Format

NOTE: Players who played under 200 minutes in a season may not be shown correctly, but that was not a problem for the metric testing

So what does this mean? Did I create some new super metric or whatever that towers over the competition?

More than that, this is the “Proof of Concept” phase. My hope is some of the odd individual results will be fixed once this metric gets perfected more so. I have things I blatantly need to work on, the Offensive results overall were great in testing but the Defensive results weren’t quite as stellar (Did retrodiction testing with offense and defense). They were still good, but were a good deal worse than EPM, and defense was where I thought this would shine. This is likely because of issues with my SPM, this methodology means it can handle a worser SPM in general but that is not an excuse to have one that is not particularly good.

For me, this serves best as a “Proof of Concept”of this type of framework. In its current stage, with its testing I would tentatively say it likely is comparable to things like LEBRON or EPM, even though it may perhaps have some odder results at the top end from time to time. Maybe it does better at predicting people outside of the top 10? But EPM and LEBRON are essentially perfected versions of what they are within their framework, while this is very much scratching the surface with its framework. Each one will have certain flaws and misses and biases, but beyond some exciting test results, I think its valuable to be able to know more clearly where those biases come from even if it has less overall “incorrect” bias in that regard (with this being the last year stuff).

Obviously this wasn’t the most formal post, but yeah any questions comments concerns or if you just wanna reach out. timothycwijaya@gmail.com, Timotaij on instagram, Teemohoops on twitter, and my linkedin of Timothy Wijaya are probably the best places to reach me.

Note: Caveats over All in Ones on a more philosophical standpoint are beyond the scope of this post, but thats a very interesting discussion
Note: This list does not represent how I would rank players, AT ALL.
Note: As I said, this is a first draft of a metric.
Note: Public sphere is important, im sure teams have better versions of these in house
The WNBA version was without synergy stuff which I will have to add downloading CSV files manually, and of course no tracking data, it performed better than any All in Ones in the WNBA scene by a large margin, although I prefer LEBRON for the WNBA (LEBRON is not readily available for the WNBA, its similar in its predictive accuracy there).
Long version of this (It is unedited, has many grammar errors and generally much more rambling and isn’t super professional). Perhaps just the first part with the explanation and breakdown of All in Ones and justifications of the caveats I have with them and the issues I had with biases I mentioned is useful though. https://www.teemohoop.com/mamba-or-lepookie
Huge thanks to Seth Partnow and Ben Alamar for giving insights during the Las Vegas SBC program (We did not talk about this or anything, but it they helped me a ton with how to approach my internship), to Eli Horowitz for giving me a chance with giving me the internship opportunity as I likely wouldn’t be able to do anything in basketball if it was not for that, and to Nathan Hollenberg for helping me out with some questions I had on RAPM samples and for all the wonderful advice he gave me during our coffee chat!

TIMOTAIJ TJ 9/14/24 TIMOTAIJ TJ 9/14/24

Four Factor RAPM

Something that I thought would be fun to do was Four Factor RAPM, it ended up being pretty straightfoward so I did it last night after dinner and I ran it overnight. RAPM can be seen as a fancy way to parse a player’s impact from teammates, but some consistent weird results can pose more questions than answers. For the full breakdown of RAPM and some concerns I have with it, you can find that on this the first few sections of this post here. https://www.teemohoop.com/mamba/Blog%20Post%20Title%20One-mm8gk

If you go there or are visiting for the first time, maybe check out the All in One I made, the updated (non crazy long) write up is available at the top of the link I sent, or you can go to it directly here https://www.teemohoop.com/mamba/Blog%20Post%20Title%20One-mm8gk-cy9wh

Most of this is copy and pasted from Linkedin, but a few extra examples at the end:

Why does Jokic’s defensive RAPM always end up being so good?

Why are 2023 and 2024 Embiid and post-GSW Durant's ORAPM not quite as high as the top offensive players in the NBA?

Why does Caruso’s defensive RAPM always look like he's the greatest bald guard defender in NBA history?

A flaw of RAPM, (beyond noise and ignoring context) , is it doesn't explain the “Why.”

So, many treat it as a signal to see if something is worth analyzing or looking into. While I this opinion over this kind of data, I thought making it point out the “why” when its not noise, might be possible, to an extent.

Last night I thought it would be interesting to make “Factor” RAPM, instead of getting impact on point differential, you get impact on TS%, Rebounds, and TOs. (Think the four factors, combining FTR and EFG and FT% into one). You end up:

OREB impact

OTOV impact

OTS impact

DREB impact

DTOV impact

DTS impact

You then scale it in a way so it generally adds up to Offensive and Defensive RAPM. Which I did just running linear regression using those variables on the component, and getting the coefficients and multiplying. I did this in a way so a positive number always was good, and a negative number was bad. (Positive DTOV = force more turnovers).They don't add up perfectly, but generally add up pretty close (0.1 away on average).

Practically, how does this help with questions from earlier?

With Jokic: As much as his hands/positioning/general Defensive IQ is praised, the entirety of his positive defensive impact pretty consistently is from his insane impact on the defensive glass.

With Embiid and KD: I’ve heard arguments its from a lack of playmaking, but there’s isn't much evidence here suggesting this is what’s behind their lower than expected offensive RAPM. Durant’s impact on TS% was better than Jokic from 2021-2023, and Embiid was 3rd his MVP year and 2nd this year, despite his offensive RAPM being outside of the top 10 both years. Its mainly from negative impact on the Offensive Glass, especially for Embiid

With Caruso: He led the league in “Defensive Turnover Impact” for the last 3 years

I caution making any sweeping conclusions off this or raw RAPM in general, especially off two year samples, but I believe there are interesting practical takeaways you can get from this type of analysis, at least when its not just noise. Sometimes you can even see if a player's ranking might be dubious for something out of their control, if a perimeter player who doesn't crash the glass has an exceptionally high Offensive RAPM one year because OREB impact, and its a one year thing that isn't consistent year o year, maybe its noise or luck.

It wouldn’t be difficult to add more years or change the bounds of years depending on if the NBA PBP Possession format from the API is the same

data from 2016 to 2024 (2 year stints): https://timotaij.github.io/FactorRAPMScaled/

Raw version (unscaled to RAPM) from 2016 to 2024 (2 year stints): https://timotaij.github.io/FourFactorRAPMRaw/

Running a WNBA version of this is INCREDIBLY noisy because, well RAPM is incredibly noisy for those samples, but some interesting practical takeaways from that, thats private though lol.

I haven’t done anything on MAMBA since I made that blogpost, but might take out the offensive and defensive team * minutes effect that was originally there.

TIMOTAIJ TJ 5/28/19 TIMOTAIJ TJ 5/28/19

Long, Unedited Version of All in One Post

It all begins with an idea.

NOTE: If you are reading this for the first time, please go to this link instead:

https://www.teemohoop.com/mamba-or-lepookie/Blog%20Post%20Title%20One-mm8gk-cy9wh

What are All in Ones, What actually Goes Behind them?

So through this week, I made my version of all in one metric I had conceptualized awhile back, at least a first draft of one, and I will get into that below, but I think its important to explain what an All-In-One metric is too. Most explanations online either cut a million corners or it’s an online statiscian saying “It’s just this simple formula :D” and pulling this out,

Since I hope some people Vegas are reading this, as well as maybe anyone else in the basketball scene that stumbled on my Linkedln post, I think that its worthwhile to explain fully in a more simple and easy to understand way what All in One metrics really are so you can make your own opinion on what you think of them. This won’t be the most formal intro to the stat or a metric you’ve ever seen, but I hope it’s interesting. Just skip through if you want to skip to the breakdown of the metric itself, but I would probably say at the parts of what the number is comprised of is kind of important and how it tests, and that section kind of flows nicely from this section.

Before getting into All in One metrics, its important to understand what RAPM, the backbone of most All in one metrics, is. “APM” from RAPM stands for Adjusted Plus Minus. which takes into account 11 things, the player of interest, the 9 other players of the court, and the scoring margin. It tries to see how the “player of interest” effects the scoring margin, while controlling for 9 other teammates as factors. However, the issue with APM is the idea of multicollinearity, which simply means it has a hard time distinguishing between teammates who play in many lineups together. Basically, it struggles to assign the right amount of “Credit” to people who play together alot, it will think because I am on the court with Lebron, it gets “tricked” into thinking that Im really good at basketball and not that Lebron carries me.

This is where the “R” in RAPM comes in, to mitigate this issue and distinguish between teammates who share the floor alot. It can help say Lebron is the one carrying the team and 2018 SNL recruits Childish Gambino, Pete Davidson, and Kenan Thompson, weren’t actually a secret top tier players, they just played with Lebron in many lineups. R stands for Regularized, and in general in means Ridge Regression for RAPM. now that sounds all fancy, and this is usually where someone throws a giant math equation or says “Just know it does this” But its actually pretty important to get this part to really understand in depth how All in Ones work and why they work, so I’ll give a real world example to illustrate this.

Imagine you’re a teacher have 2 troublemaking twins, one named Marco and the other one named Kenji, but you know one of them is “The bad apple” and the other one is just following their lead, Similar to how Lebron is carrying the team to high score margins and Random Player “Justice Young” is just along for the ride by being on the court with him alot. One of them starts shouting and the other one follows, and you want them to stop, and figure out the bad apple. You get all fake dramatic and yell at them to shut up, and you keep doing this over and over again, knowing the not so bad kid would feel bad and chill and the more consistently problem child will continue to troll. Eventually, you keep doing it enough, Kenji starts feeling bad and chilling out since he wasn’t really like that, while Marco just keeps screaming since Marco sucks, you’ve learned Kenji was chill and Marco was the real trouble maker.

Practically, Ridge regression is a similar concept. Instead of kids yelling, its their “scoring margin values” (Or how impactful they are), you’re pushing the values to 0, and instead yelling at them, its high score margins. So think about it this way, You are trying to find out the troublemaker between the Twins Marco and Kenji (finding who is the “driver of the high scoring margins” between Lebron and Justice Young who often play together in lineups), so you start Punishing them and telling them to quiet down (Shrinking Coefficients to 0), and as you keep punishing them, Kenji, who wasnt truly a problem child but just following Marco, begins to behave while Marco continues his mischief and is less effected by the punishments (Player X value starts going down, Lebrons stays high up and is less effected by the punishments as his “effect” is more consistent and is driving it more.).

The main confusing thing there is shrinking coefficients (PlayerValues) to 0, but just view it as “punishing” and you basically get the gist of it. I guess one better way to explain it would be, imagine instead of a advanced model, its a guy screaming out numbers of how good he thinks Lebron and Justice Young are every time he sees them play, and every time he does this you tell him they are both 0/10 players (You a hater), his “screams” are the coefficients (model guesses. While not a perfect representation, you can view it as the model keeps making guesses on how impactful it thinks players are, until it’s happy with its choice or runs out of film to watch) and you telling him everyone is a 0/10 player is “shrinking his ratings towards 0.” Hisranking of role player Justice Young is going to be more affected by you trolling him into saying he sucks, than his ranking of the greatest player to ever pick up a basketball where he can clearly see the greatness.

And thats pretty much RAPM! Why was it important to understand All-In-One data? Well, there are multiple forms of All in Ones, but the one I made, LEBRON, EPM, and ESPN RPM back when it was created (Its creators have since left to NBA teams, I’ve heard that the metric is a bit weird now since they left), are “Bayesian Prior Informed RAPM” That sounds super fancy and I have absolutely no idea why people don’t ever explain it normally, but its actually simple (like genuinely, not in the, “its suuper simple and then throwing an alphabet with an equals sign at you”).

Instead of the punishment being you punish everyone’s values to 0, you “punish” different players numbers specific to how good you think each player is. This is where the Box Scores usually come into play, you just use box scores to create a number for each player that gives a rough estimate of how good that player is. This has a HUGE effect on RAPM.

If that’s confusing, You can think of it this way, Imagine RAPM is your friend who has never watched basketball before, trying to learn about basketball in a limited amount of time (in this case, time would be the possession sample size it has to learn from).

Without Regularization, he just thinks everyone on the 2016 Warriors was a 10/10 because they won by 50.

With Basic Ridge Regression, which is pushing the values to 0, when he said he thought they were all great you kept saying they all actually suck, and he kept watching and said “hmm I guess some of them weren’t as impressive as I thought, but I thought that Curry guy was pretty good though!”

With Bayesian Regression, as he is watching you are giving him you’re complete honest opinion on how good every single player is while he watches, and keep saying your opinions on those players instead of saying they are all 0s

This number you are saying to him, is the “PRIOR”

You see the difference? Keep in mind in this case your friend is a super genius and will pick up on things eventually, but he’s a bit slow and just needs a lot of film, or a bit of a nudge. With a limited amount of time, getting him closer with those good opinions will really speed up the process as often he won’t have enough time to get the answer right.

That in a nutshell, is what much of All in One data is, at least a large proportion of the best ones. They create a number represent how a player is using box score data, and that becomes the prior that you scream at your friend watching the game over and over again.

Caveats to this Approach in All in Ones:

It all sounds really nice, but there are some practical issues, 2 of them I will put down here in my opinion (I’m not gonna go into the caveats with this type of approach for evaluation for now too)

1) that it takes a pretty big sample for your friend to truly get players right

2) the priors themselves (Your opinions your telling your friend watching the game) can skew his opinion in incorrect directions.

My version tries to tackle these in its own way (in the week I made it lol) but here’s kind of in depth an explanation of the issues to demonstrate why I felt they would be interesting to tackle this way. Feel free to skip this if you don’t really care.

(For this explanation, you can think of Noise = things distracting from the true value, imagine you’re trying to listen to lyrics of a song to memorize it but the baby starts screaming at the same volume so now you think the song has some crying in it)

The friend example was good as a visualization and to demonstrate it in a more human kind of way, but i’m throwing it away from here because it kind of takes away from the point of what RAPM in its raw form is and the benefits of it. It’s a impact metric only attempts to parse out the impact player X has on his team’s scoring margin, accounting for the 9 other players on the court. it cares about NOTHING else. Simply put, its unbiased. Sample size is an issue and short term RAPM is noisy, but some people tend to mistake this for “RAPM just doesnt say anything valuable in small samples.” RAPM is a raw impact metric, it. is a measurement of raw impact which in itself is used by people as an estimate of “True Impact”. Whats the difference? Raw impact is simply the points when you go on and off the court adjusting for teammates, True impact is whether or not you are actually the reason or a factor for that score or if its just coincidence you were there when something good happened (You happen to be there when good things happen that you didn’t effect indirectly or directly in any true way at all). A lot of “Raw impact” is simply noise, but it isn’t necessarily always noise, which I think is a key distinction.

With low but reasonable sample RAPM (lets say a season) you do get a ton of wonky results, but much of that “noise” is simply the instability of short term impact data itself, most of the time (key word is most, as in like more than half the time of course) you aren’t going to see wonky results that aren’t apparent in the raw impact data when you look at a player amongst their teammates.

This created an interesting debate in some places I saw back when All in ones first came out, I was like 15 at the time, but from what I remember some people were a bit unhappy and saying all in ones killed the point of this kind of thing. to be clear, I disagree with that take, but I do understand where it’s coming from. With the priors, You end up reducing noise but creating bias, but on the whole this tradeoff is 100% worth it. It’s just an issue on some individual cases at times which I’ll get into more below, but as a whole thats more for when people get to fixated on marginal differences and rankings between players

The second issue is that the Box Score Prior itself isn’t so simple to make. The way it is made is you get stable samples of RAPM, and you train a model that can take inputs (Box score numbers) that can predict a player’s RAPM, and make that the Prior. If you’re part of MSBA reading this and on the more technical data side, you might think “XGBOOST” but no, that doesn’t work because from my understanding that the errors in non-linear models tend to be unacceptable, and in my brief experience running it for this it was awful. Even interaction terms create large unnaceptable errors at an individual level. For a Draft model, sure, Boost dat, I even did one for my internship and my portion was pretty solid (I think I used XGBoost or Lightgbm i dont remember tbh), but not for this kind of thing.

you WANT outliers, at least in terms of really good player X and Y, you want it to “overshoot” on certain players and superstars in some years to stabilize when noise causes some players to be underrated. On a more meta evel, you want some players to be overshot for the sake of a metric looking more respectable, well and for the sake of messaging to be honest. If the ONLY goal was getting a high prediction on RAPM this would be easy, but you kind of have to have some semblance of common sense with your results. That isn’t to say that deviation from general opinion is wrong, having a guy like Caruso in the top 20 or something is completely fine in my opinion when his impact signals are THAT strong (all in one metrics are NOT a ranking of how good players are in a vacuum, to be clear), but if your list has like a bunch of role players in the top 10 and superstars out of the top 50, something is probably wrong. That being said, if certain players are consistently far different from preconcieved notions of where they would rank and 97% of others aren’t really, that’s a valuable data point, but people can often make a bit too strong conclusions from that. A box score prior does help RAPM become far more stable, and also can help create a final metric that isnt completely laughed out of the room. But here’s the thing, its a linear regression, you are applying a generalized pattern to the entire NBA, you ALWAYS are going to overshoot or undershoot on certain players. I have gotten push back on this statement before, but while while it undoubtedly creates better observations overall in a GENERAL level, there are CERTAINLY some players who are overpushed or punished on an INDIVIDUAL level. For me, while I do find all in one data valuable, I don’t view it as a raw measurement of impact like some other people view them. While RAPM has noise, All in Ones have Bias. 99% of the time, that small amount of Bias is worth it and helps a CRAZY amount, but that Bias can also lead to fundamentally incorrect predictions at the individual level where perhaps the Noise wasn’t truly far from reality. To me, both have their place when analyzing a player, All in Ones much more so especially if you only can pick one, but also, watch the game lol

Next two paragraphs are a slight case example with Lebron, you can ignore this if you want to

A case example. Awhile ago, I saw a pretty bad Article on BBall index.com. Now, I do really enjoy the site and like what it stands for, and to be clear, this WAS NOT WRITTEN BY TIM (also known as Cranjis Mcbasketball), Tims a smart guy and he’s pretty chill to talk to so he wouldnt write something like this, but the gist of the article was basically one of the other writers clickbaiting off of the olympics doing a “Lebrons not top 10 and I’ll tell you why with FACTS and STATS” and it just being a guy pulling out the LEBRON metric…

But it actually is relevant to this, because Lebron represents probably the clearest example (That I know of) of a high profile player that represents a bias. While I don’t want to go on a 10 page tangent defending Lebrons honor from Lebron on a spreadsheet in Capslock, what I’ll say is that, especially on the defensive end, for pretty much his entire post Miami career (at the very least),any available “Box Score” component for an all in one of Lebron’s data severely undershoots defensively. The 2 exceptions, 2018 and 2022, are the only years where his actual raw defensive impact data wasn’t good (according to RAPM). This is the case for LEBRON, DPM, and Mine (I’ll release the overall numbers, I can give the priors to anyone who asks but this is a first draft still so need to do some tuning) etc. . On a deeper level though, despite his great box scores, what you end up seeing fairly consistently is the more you weight box scores, the less impressive his All in One data can be. This doesn’t mean “Hey maybe his impact data overrates him” because that’s really not how it works if its this consistent for long periods of time for a high production player, it means Lebron is better than his box score production indicates. To be clear, Lebron’s career age adjusted impact data is by far the greatest in history, and if you only get playoff RAPM (there are caveats to doing it that way beyond the scope of this post), he’s basically a lone dot at the top even without adjusting for age, and thats with him being in LeCoast mode in the Regular Season since 2014. All in one data ironically shrouds the case here, but for his Career Lebron is pretty much the Undisputed king in the realm of impact data (Although obviously now he’s no longer undisputed #1 there). Im sure there are other examples (I feel KG would be another guy?), and sometimes this is by design (LEBRON tends to give extra weight to rim protection from my understanding, which helps it more in terms of predictive value since top tier bigs defenders are better building blocks than top tier perimeter defenders, even if it might not show up on raw impact stats for some of the non absolute top tier DPOY type bigs), but you get the point.

End of Lebron stuff

The box score prior is where a lot of the separation itself happens between these metrics. Its actually where people do unique stuff, but overall I think of an All in One as an estimation and some treat it as the asnwer. I don’t know how good my metric is or how the final version will be (I’ll show the results of my retrodiction testing before I put it down below, it actually performed super well, in and out of sample, but I still have alot to work on it I literally started doing this 5 days ago and 2 of those days I was out and about.) But regardless of how good this metric ends up being I don’t think I’ll ever phrase a result like “Player X was a 7 player in impact because my number said so” because all it means is my estimation puts them here, 0% chance I agree with any of these metrics exactly I mean this one (Spoiler Alert) hates AD and as a huge AD fan quite literally 0% change in my opinion on that man lol. My estimation and EPM tend to not love AD while LEBRON has him around top 5, whereas Mine and EPM Love Bron and LEBRON has him at like 19th, its just how these things go sometimes.

Personally I think both of them are both easily top 10, and top 5 in the playoffs, (#1 and #2 this year btw with a Young Pat Riley with a Calculator Presence and Drip at the helm btw) but I live in LA (Although I’m willing to relocate for any WNBA or NBA team if I can’t get a return offer pls im desperate lmao look at all this I will literally work for on a fry cook salary to make up for the VISA lol).

Little side note, Also RAPM tends to run a bit differently on Python or R, I know J.E. RAPM and the Ryan Davis LA RAPM is very different from the one on BBI, BBI recently has done more complex stuff with their 3 point shooting luck adjustments from what I know (I know some people love and some people hate it, not gonna get into that yet). But i honestly feel like its a bit weird to see alreadly luck adjusted O-RAPM from Ryan Davis have Jokic as clear #1 and Giannis around 5-6 ish over the last 2 years, and then BBI O-LA_RAPM has Giannis as like country mile 1st on offense and Jokic is like 3rd and 6th.

This isn’t to say “NYEHEHEHE They did it Wrong!", its just the biggest example for one jump I could think of, getting into which RAPM set has the most “errors” can be a dicey proposition, and im not opening that can of worms, but my main point is some of the changes seem too dramatic for a slight adjustment ON the luck adjustments to an already luck adjusted set, especially for something where testing I’ve seen (shown later) seemed to indicate those adjustments didn’t provide super significant improvements. At the very least, I think weighing the weight of the assumptions to be made vs the practical results if they cause this big jumps has to be considered. To be clear, LOVE the LEBRON metric and think it and EPM are relatively close and both the undisputed top right now.

SKIP HERE FOR THE METRIC ANALYSIS AND BREAKDOWN:

Now that that part is done its time to get into the “fun” part, whats my metric!

First, I would like to thank Nathan Hollenberg, Seth Partnow, and Benjamin Alamar. I didn’t talk to Mr.Partnow or Mr.Alamar about this metric or anything, but I got to talk to them a bit during the Vegas seminar (just about data as a whole) and they were just super smart and cool and insightful to listen to. I had a coffee chat with Mr.Hollenberg, I had a bit of a plan about the metric by then and he gave me some advice for how long RAPM should be and he gave me advice and I do think that the reassurance that I wasn’t just being insane and my thought process wasn’t absurd or anything was a big push of confidence I feel. The coffee chat was super cool and he was just a really nice guy and I learned a ton about how to approach all of this and it did kind of make me think hmm this might actually be a cool idea. Also helped me a ton at being better at my internship with the advice he gave me!

Would also like to thank (I feel werid calling him Cranjis and I learned his whole name by accident and there is ZERO percent chance im ever gonna reveal that information to anyone so Im going to say Tim), generally helped me out alot in terms of combining Data and Xs and Os stuff which i still think im probably better at than data stuff although doing this project over the past few days was pretty fun, and Jeremias Engelmann for being a fantastic resource through his postings on APBR, also for being the reason I found RAPM when I was 15 on his dropbox links lol.

Of course I have to thank Eli Horowitz because if I didn’t get this Sparks internship I probably would have had to shift gears by now, and the Sparks experience has been absolutely amazing and I’ve really fallen in love with the entire process of being in an analytics department. I could write a whole essay on how that’s been such life changer for me (Well unless im deported within the next 90 days) but this is already going to be crazy long lol.

So, the Results will be down below, but I would like to go into what is the value add of what I made based on what I said about all in ones previously. Like I said in my Linkedin post, this is a rough Draft, but overall it tested very well when I compared it to EPM and LEBRON. My testing on EPM and LEBRON was based on the methodology Krishna Narsu (LEBRON creator) did on twitter.