Understanding Scoring Propensity: A Mixed Model Approach to Evaluating NBA Players

“Who’s the best scorer in the NBA?” is a question that comes up a lot during conversations with my friends. Names like LeBron James, James Harden, and Steph Curry always come up.

It’s often difficult to come up with a single answer; the question becomes more nuanced when distinctions are made within scorers. How do we distinguish talent when taking into context the different situations in which players score?

For example, how do we compare two players like this:

Both players are prolific shot makers, but they both score in different ways. How can we put them on common ground?

*Note: For this analysis I will not be using free-throw rate and/or percentage. A better title for this would be best “shot maker”, given they are not fouled.*

A Simple Approach: Points Per Shot

Naively, our first argument for “best scorer” might hinge on using points per shot (PPS) as a means of analysis. After all, whoever scores the most points per attempt should theoretically be the best scorer. There are drawbacks to this approach, but we will address those later.

Collecting data from the 2014–2015 NBA season from kaggle.com, we can begin some initial analysis. If we aggregate players and take their average points per attempt, we can see who the most successful and least successful shot makers are.

Top 10 PPS (2014–2015)

From the list on the left, we can see that the most efficient scorers are big-men who catch lobs (Jordan/Chandler). Mixed in, are sharpshooting 3-pt shot-makers like Korver and Babbitt.

If we were naive, we’d simply conclude that these were the best “shot-makers” in the league. However, from our basketball knowledge, we know that shots closer to the rim (i.e. the dunks and lobs that Jordan/Chandler receive) are much easier to make than shots further away. Furthermore, we also know that more open shots are easier to make than highly contested shots. Let’s test our intuition with data validation to make sure we are on the right track.

Shot Distance vs Def Distance (yellow indicates make, purple indicates miss)

If we look at the scatter plot above, we can see that players are more likely to miss as they are further away from the basket. When they are closer to the basket and the defenders are further away, they very rarely miss. We can imagine this as a transition layup/dunk opportunity. Our intuition is correct, and we can proceed with more nuanced analysis.

A Slightly Better Approach: Logistic Regression

In order to account for these various factors (shot distance, defender distance), we can build a simple logistic regression model to predict whether a shot went in or not. Features include an intercept term, the shot distance, and the defender distance. Logistic regression assumes that the error generating process is the same for each sample. Keep this assumption in mind as we go through this approach.

Model Generation

Just building a simple model, we can see that blindly guessing on shot conversion would give us about 55% accuracy. The model makes an improvement of about 5%. With more hyper-parameter tuning, I’m sure we could do better, but not incredibly so.

Examining the residuals shows that the model is not exactly fit well (not centered at 0, and slightly skewed). This could indicate violation of the i.i.d assumption that was mentioned earlier.

Residuals of Fit

The violation could be due to the fact that the model is player-blind. Maybe there are different error-generating processes for each player. Essentially, each player could have a different effect on whether a shot will go in or not, based on some implied latent skill-level.

In order to address this issue, we could do one of 3 things:

  • Add in players using a one-hot encoding representation to the existing logistic regression model
  • Build a separate model for each player
  • Somehow combine the above strategies

In the 1st approach, we aren’t really treating each shot attempt from different players in their own unique way. We are still assuming that each shot comes from the same data generating process, so our assumption will still be violated, though our accuracy might improve.

In the 2nd approach, we will certainly account for different errors for different players, but we will end up building a lot of models!

The third approach, which is my preferred approach, combines the above strategies into one model. It is called Hierarchical Linear Modeling.

A Not Perfect, But Much Improved Model: HLMMs

Hierarchical linear mixed models (HLMMs) are models meant to deal with grouped data with clear hierarchical structure. For example, these models have been used to study the effect of different school districts on students’ SAT scores.

In our case, we want to see the effect of different players on shot outcomes. Our data (shot attempts) can be grouped in the following ways:

  • By game
  • By Team
  • By Player
  • By Closest Defender
  • By Quarter

Each of these different groupings are meant to explain some of the variance in the data. Another way to think about this is that each of the above groups has a different effect on the outcome of a shot. For example, maybe most of the variance in the data (i.e. make or miss) is determined by the group “Player”. These types of groupings are termed “random effects”.

HLMMs consist of fixed and random effects (hence the “mixed” in the name). Fixed effects are variables that are thought to affect all groups equally. For example, in our case, the distance of the shot can be thought of as a fixed effect.

These models account for imbalances in the groupings as well. For example, let’s say out of 10000 shots, we only have 6 data points for a bench player since this player does not usually play much. If we used our first approach, these 6 data points would be slightly insignificant in determining this player’s value. If we used our second approach and built a separate model for each player, this player’s impact would be severely over-fit. Imagine if this player made all 6 shots; the model would regard him highly, but might not be accounting for his luck.

HLMMs find a happy medium between these two approaches. Simply put, if a member of a group does not have much data, his effect will be closer to the mean of all members of the group. If he does, his effect will be further away. This is called “shrinkage”. If you’re interested in learning more about mixed models, this link should give you a good starting point.

Applying HLMMs to Shot Data

Let’s revisit our points per shot analysis. Before, we only showed players’ points per shot, but didn’t show how many shot attempts they took. Including this information might show whether a player was “unlucky”.

On the left, we can see there is a huge discrepancy in the number of shot attempts between players. Take the difference between Stephen Curry and Hedo Turkoglu. Both have the same PPS, but Steph has 9x more attempts! Intuition tells us that Steph’s information is more trustworthy, but can we definitively say that Steph is a better shot-maker than Turkoglu using this information alone?

It turns out we can attempt to, using hypothesis testing, but this assumes that both Stephen and Hedo took the same exact types of shots. Since this is highly unlikely, we’ll ignore this route.

What we can do is build a HLMM and interpret the coefficients of the model.

HLMM Specification

Above is how a mixed model is specified. The beta terms are the coefficients of the fixed effects “X”, and “u” are the coefficients of the random effects “Z”. Notice that the model is linear; a positive value of coefficients suggests that an increase in the associated predictor (when other values are held constant), coincides with an appropriate increase in the predicted value, “y”.

In terms of basketball, we can see how a player’s coefficient relates to the predicted value, shot outcome. A better shot-maker would have a more positive coefficient.

Building the HLMM: Using Bayesian Methods

Let’s revisit our data.

Data Frame for Shot Data

Let’s look for possible groupings in our data that might introduce variance. Some that come to mind:

  • Player
  • Player’s Defender
  • Game_ID
  • Player’s Team
  • Player’s Opponent
  • Quarter

Now, we can look for effects that might be fixed across populations:

  • Home/Away game
  • Shot Distance
  • Defender Distance
  • Number of Dribbles
  • Touch Time
  • Time Left in Quarter
  • Shot Clock

Usually, when using Bayesian methodology, one would like to fit the “maximal model”. What this means is, if one intends to in our case, confirm or deny a hypothesis (i.e who is the best shot maker), we should specify the maximal random effect structure. What does that mean?

Consider our features above. We could set the fixed effects as such, and specify a random intercept for each group. However, intuitively we know that shot distance, defender distance, and dribbles before shot affect each player differently. Therefore, we could posit a structure that included a random slope for each of the aforementioned feature as well as a random intercept. Furthermore, we could posit that defender distance affects each defender differently, and include a random slope for that.

Using R’s brms package, I can fit a model with “maximal structure”, and interpret the results. Note that this will take a while depending on the number of iterations and chains you run.

Analyzing Model Results

The model I ended up fitting had this formulation in R.

FGM ~ Intercept + LOCATION + CLOSE_DEF_DIST + (1 + SHOT_DIST + DRIBBLES | Player) + (1|CLOSEST_DEFENDER)

What this means is that the response (FGM) is a function of the fixed effects (home/away and defender distance) + the intercepts for closest_defender and player + the random slope effects of the player multiplied by the shot_distance and dribbles taken. The response, FGM, is a binary 0/1 response; this means the coefficients have to be interpreted in the log scale since the effects are transformed using a logit link function.

To capture the effect of a certain player with respect to other players, we have to do a little bit of math. Recall that the coefficients measure a response if the other predictors are held constant. In our specified model we have several variables to keep constant:

  • Location (Home/Away)
  • Defender Distance
  • Shot Distance
  • Dribbles
  • Defender

Since I scaled the values of defender distance, shot distance, and number of dribbles, they should have median value 0. We will use this as our constant to make our calculations easier. For simplicity, we will assume that each player is at home. Next, we must fix a defender value; we will use the median value. Random effects are specified with a Normal(mean=0,variance=sigma), so the median value should be 0! This makes our final calculation quite easy.

Calculating Player Effect

The logit link performs the following transformation:

ln(p/(1-p)) = βX + Zu

Recall that “Z” and “ β” are the coefficients of the random and fixed effects respectively. “P” is the value of the probability of the successful outcome (made shot, in our case).

In our case (since we use median values of 0), this breaks down to :

ln(p/(1-p)) = Intercept + Home + Player

I’ll call the RHS of the equation the linear_predictor. Now, solving for p, we get:

p = e^(linear_predictor)/(1+e^(linear_predictor))

We can now compare the player effect on shot conversion in terms of probability!

Since we went Bayesian, that’s not all we can do. We can actually look at the probability distribution of each player’s effects using the samples generated. This is much better than using a single value, since we can compare uncertainties in estimates.

I’ll illustrate with an example below, revisiting the case of Stephen Curry and Hedo Turkoglu. Recall Stephen and Hedo had equivalent PPS, but Stephen had 9x more the shot attempts.

Player Effects

In the orange, we have the distribution of shot conversion values for Turkoglu, and in the blue, we have the same for Curry.

Some things to take away:

  • The distribution for Turkoglu is more spread out. This means we have less certainty about his effect. This makes sense since we had only ~100 shot attempts from him. Consequently, Curry’s distribution is tighter which implies the model is more certain about his effect.
  • The peak of Turkoglu’s distribution is shifted to the left of Curry’s. This indicates that the model estimates that Curry more likely has a greater effect on shot conversion than Turkoglu! However, we can look at the intersection of the distributions to see how frequently Curry possesses a higher effect. We simply take the mean of how many times Curry’s “x” value is greater than Turkoglu’s. From the distribution above, it is determined by the sample to be 91.875%. It is about 92% likely that Steph possesses a better conversion effect than Turkoglu.

We can repeat this process for all combinations of players and come up with comparisons, but it is more useful to think of how a player increases the probability of a shot going in, compared to others. I won’t be speaking in distributions; instead I’ll just use the median value of the samples.

We can then subtract the median of conversion for all players from each player’s effects, and look at the probability of conversion above replacement for each player.

Top 20 Shot Converters

We can see the list to the left for the top 20 and bottom 20 shot converters for the 2014–2015 season. What do we notice? What stands out? How can we interpret these results in terms of the model we built? Why do we see certain players above others?

Bottom 20 Shot Converters

Final Remarks

The data is somewhat noisy. The closest_defender might just be a player that was helping over, and not one that actually contested the shot. The data lacks what type of shot a player took (jump shot, dunk, hook shot). The data also lacks the density around a player when he took the shot ( was the lane congested, or was it quite clear with only one defender). We notice that the big men are penalized quite heavily; they are frequently at the bottom of the replacement level chart. This might be because these players frequently engage in tip-ins which usually do not quite convert on the first attempt. Again, having more granular data (including score differential to weed out “clutch” players) would be beneficial. Leave your comments below. Remember this is only for the 2014–2015 season, but the process can be applicable to today’s data as well.

Acknowledgements

I’d like to acknowledge Alex Hayes for his assistance understanding the variance-covariance structure of Mixed Models in a Bayesian setting. I’d like to also acknowledge my friends who continuously engage in heated debates, and giving me the inspiration to tackle the debates analytically. I’d like to also acknowledge Stephen Curry for his greatness.


Understanding Scoring Propensity: A Mixed Model Approach to Evaluating NBA Players was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Reply

Your email address will not be published. Required fields are marked *