Matt Anikiej

Defense Quality: A New Way of Measuring Defensive Impact in the NHL

One of the hardest things to measure in the NHL is the impact good defense has on a game. There aren’t many counting stats that can directly point to good defense, rather, good defense often shows up as a lack of production from an offensive player. There’s been many attempts to create solid defensive stats, and many excellent ones as well. I want to pose a new stat called Defense Quality, that measures a player’s impact compared to the rest of the league. Defense Quality addresses gaps in current defense metrics by focusing on the amount of scoring chances a player gives up rather than goals, and rewarding efficiency rather than single game performance.

Current Stats

Defense Quality is not the first stat that measures defensive impact, and not even the first one focused on scoring chances. Corsi and Fenwick do a great job of showing a player’s impact, and xG models are incredible at showing tangible results. Dom Luszczyszyn from The Athletic has a great Defensive Rating metric that is based largely on xG and per game stats. It’s measured in goals, and can be translated into a WAR metric. It is an excellent stat for measuring the results of a game, and this is where Defense Quality differs from Defensive Rating.

There are three main gaps in Defense Rating that Defense Quality aims to fill.

1. Defensive Rating does not take into account how a goal was scored.

While all goals are not necessarily treated equal as it uses G - xG, it still only focuses on the outcome of a play. Defense Quality aims to fix this by focusing on the quality of scoring chances opposing teams have, while the player is on the ice.

2. Defensive Rating focuses on per game statistics rather than efficiency.

Per game statistics are good when focusing on positive stats such as, hits, blocks, etc. However, a top pair defenseman is going to be playing more minutes against better opposition, than a bottom pair defenseman, which will inevitably inflate their negative stats such as goals, chances, etc. This makes it difficult to correctly scale a player’s “per 60” impact. However, I’ve found that incorporating the player’s ice time and situation has pretty successfully addressed these concerns.

3. Defensive Rating neglects a goalie’s impact on goals.

Using xG is great for analyzing the results of a game and a player’s impact. However, it can unfairly punish good defense because of poor goaltending. Defense Quality does the opposite, and ignores the result in favor of focusing on the quality of the scoring chance itself.

Together, Defensive Rating and Defense Quality give a much more nuanced understanding of a player’s impact on defense. I want to emphasize that Defense Quality is not meant to replace Defensive Rating. Fundamentally, they measure different things. Hockey games are won by scoring goals and not generating good opportunities, so it’s important to analyze the results of a game like Defensive Rating does. Defense Quality is meant to address gaps in the information of Defensive Rating, and provide a further in depth analysis of the player’s impact on defense. Since we want numbers accurate to today, we will be basing this metric on data from the 2024-25 season.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv("/Users/mattanikiej/nhl-stats/data/2024-25/skaters.csv")

data.head()
playerId season name team position situation games_played icetime shifts gameScore ... OffIce_F_xGoals OffIce_A_xGoals OffIce_F_shotAttempts OffIce_A_shotAttempts xGoalsForAfterShifts xGoalsAgainstAfterShifts corsiForAfterShifts corsiAgainstAfterShifts fenwickForAfterShifts fenwickAgainstAfterShifts
0 8478047 2024 Michael Bunting NSH L other 76 2237.0 37.0 26.19 ... 7.28 10.09 72.0 87.0 0.00 0.00 0.0 0.0 0.0 0.0
1 8478047 2024 Michael Bunting NSH L all 76 70819.0 1474.0 43.70 ... 161.54 187.75 3221.0 3522.0 0.00 0.00 0.0 0.0 0.0 0.0
2 8478047 2024 Michael Bunting NSH L 5on5 76 59813.0 1294.0 43.70 ... 112.73 122.08 2661.0 2707.0 0.71 1.71 19.0 43.0 16.0 31.0
3 8478047 2024 Michael Bunting NSH L 4on5 76 6.0 2.0 2.58 ... 0.20 0.17 4.0 11.0 0.00 0.00 0.0 0.0 0.0 0.0
4 8478047 2024 Michael Bunting NSH L 5on4 76 8763.0 141.0 36.88 ... 23.81 2.60 311.0 54.0 0.00 0.01 0.0 1.0 0.0 1.0

5 rows × 154 columns

Defense Quality Formula

So what is Defense Quality? It is a weighted average of a player’s defensive stats. There are 16 total and they’re separated into 6 groups. Each gro’up’s total weight, is the sum of the individual statistics in that group.

1. Ice Time (5%)

This is used to weight a player’s ice time, so as not to over reward great shifts in small sample sizes.

2. Shot Suppression (27%)

This has the highest weight and importance in Defense Quality. We want to measure the quality of offense the opposition has, and therefore Defense Quality is very heavily focused on suppressing scoring chances. Scoring chances are much more tangible than xG, and much easier to understand. Although this doesn’t address the biggest issue with the blackbox of xG sometimes (expected by whom?) since there are a million and one ways to define the danger of scoring chances, it still provides good information on preventing offense. Essentially, it’s a more in depth Corsi Against.

For those wondering, this dataset used MoneyPuck’s formula since it’s very easy to understand: High Danger is a shot with >= 20% chance of a goal, 20% > Medium >= 8%, 8% > Low. Natural Stat Trick has a danger calculation as well.

3. Physical Defense (15%)

Regardless of the quality of the chance, a blocked shot is potentially as good a save. Anytime a defenseman can stop a puck from getting to the net is good defense and must be rewarded.

4. Puck Management (20%)

A giveaway is as disastrous as a takeaway is good, which is why they’re weighted the same. A giveaway in the defensive zone is a blunder that can cost games. 10% may seem low for that reason, but keep in mind it’s getting punished twice (Giveaway + Defensive Zone Giveaway), and potentially a third time if it leads to a scoring chance.

5. Shift Quality (17%)

Being able to break the puck out is good defense. That’s why starting in your defensive zone, and ending in the offensive zone gets rewarded. Understanding where a shift starts and ends also lets us imply some things that can’t be measured. Starting in your defensive zone often implies good defense, as the coach trusts the player’s defensive impact. Ending in the offensive zone, implies a good and efficient break out, as the player wasn’t gassed yet and didn’t come to the bench. Ending in the neutral zone or on the fly is rewarded, but might have taken a while so it’s not as rewarding as a quick transition from defense to offense. Is this a perfect way of measuring a breakout? No, but the NHL Edge data does not have breakout statistics and impact, and I’ve found this is a good work around that fits in 90% of situations.

6. Penalty Differential (16%)

Taking a penalty is often due to lazy defense or getting beat. However it happened, a penalty kill puts the player’s team in an extremely dangerous situation is treated as such. Conversely, drawing penalties usually happens during good offense, and it unfairly skewed towards offensive players. For example, when weighted equally, Jack Hughes was a top 5 defensive forward. It’s still rewarded, but at a much lower weight than taking a penalty is penalized.

# All stats needed for Defensive Rating calculation
defensive_stats = [
    # Player Info
    'name',
    'team',
    'position',
    'games_played',
    
    # Ice time responsibility
    'icetime',

    # Shot Suppression
    'OnIce_A_highDangerShots',
    'OnIce_A_mediumDangerShots',
    'OnIce_A_lowDangerShots',
    'OnIce_A_shotAttempts',
    
    # Physical Defense
    'shotsBlockedByPlayer',
    
    # Puck Management
    'I_F_takeaways',
    'I_F_giveaways',
    'I_F_dZoneGiveaways',
    
    # Shift Quality
    'I_F_dZoneShiftStarts',
    'I_F_dZoneShiftEnds',
    'I_F_oZoneShiftEnds',
    'I_F_neutralZoneShiftEnds',
    'I_F_flyShiftEnds',
    
    # Penalty Differential
    'penalties',
    'penaltiesDrawn',
    
]

print(f"Total stats used: {len(defensive_stats) - 4}")

Total stats used: 16
data_5on5 = data[data['situation'] == '5on5']

data_5on5.head()
playerId season name team position situation games_played icetime shifts gameScore ... OffIce_F_xGoals OffIce_A_xGoals OffIce_F_shotAttempts OffIce_A_shotAttempts xGoalsForAfterShifts xGoalsAgainstAfterShifts corsiForAfterShifts corsiAgainstAfterShifts fenwickForAfterShifts fenwickAgainstAfterShifts
2 8478047 2024 Michael Bunting NSH L 5on5 76 59813.0 1294.0 43.70 ... 112.73 122.08 2661.0 2707.0 0.71 1.71 19.0 43.0 16.0 31.0
7 8480950 2024 Ilya Lyubushkin DAL D 5on5 80 70786.0 1535.0 10.38 ... 121.70 119.55 2715.0 2625.0 8.56 0.32 155.0 9.0 123.0 6.0
12 8477369 2024 Carson Soucy NYR D 5on5 75 69895.0 1609.0 8.07 ... 95.12 95.00 2385.0 2269.0 5.71 1.21 107.0 19.0 75.0 15.0
17 8481518 2024 Nolan Foote NJD L 5on5 7 4075.0 100.0 1.92 ... 11.78 9.15 283.0 217.0 0.09 0.02 4.0 2.0 3.0 1.0
22 8477964 2024 Ivan Barbashev VGK C 5on5 70 64523.0 1217.0 49.58 ... 108.42 103.12 2432.0 2368.0 0.90 0.33 23.0 21.0 17.0 13.0

5 rows × 154 columns

defense_5on5 = data_5on5[defensive_stats]

defense_5on5.head()
name team position games_played icetime OnIce_A_highDangerShots OnIce_A_mediumDangerShots OnIce_A_lowDangerShots OnIce_A_shotAttempts shotsBlockedByPlayer I_F_takeaways I_F_giveaways I_F_dZoneGiveaways I_F_dZoneShiftStarts I_F_dZoneShiftEnds I_F_oZoneShiftEnds I_F_neutralZoneShiftEnds I_F_flyShiftEnds penalties penaltiesDrawn
2 Michael Bunting NSH L 76 59813.0 47.0 127.0 481.0 923.0 17.0 11.0 51.0 11.0 79.0 191.0 201.0 166.0 736.0 26.0 25.0
7 Ilya Lyubushkin DAL D 80 70786.0 49.0 165.0 637.0 1196.0 109.0 17.0 103.0 67.0 208.0 193.0 202.0 168.0 972.0 15.0 8.0
12 Carson Soucy NYR D 75 69895.0 48.0 137.0 643.0 1154.0 88.0 17.0 71.0 47.0 172.0 198.0 178.0 175.0 1058.0 18.0 4.0
17 Nolan Foote NJD L 7 4075.0 1.0 8.0 26.0 49.0 1.0 1.0 2.0 0.0 8.0 13.0 15.0 10.0 62.0 0.0 0.0
22 Ivan Barbashev VGK C 70 64523.0 52.0 127.0 607.0 1091.0 30.0 18.0 73.0 23.0 129.0 176.0 202.0 194.0 645.0 4.0 12.0
defense_5on5.describe()
games_played icetime OnIce_A_highDangerShots OnIce_A_mediumDangerShots OnIce_A_lowDangerShots OnIce_A_shotAttempts shotsBlockedByPlayer I_F_takeaways I_F_giveaways I_F_dZoneGiveaways I_F_dZoneShiftStarts I_F_dZoneShiftEnds I_F_oZoneShiftEnds I_F_neutralZoneShiftEnds I_F_flyShiftEnds penalties penaltiesDrawn
count 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000 920.000000
mean 51.330435 42484.634783 29.608696 85.815217 369.086957 682.005435 36.035870 11.857609 37.316304 17.343478 97.647826 122.763043 123.059783 121.967391 537.643478 8.369565 7.692391
std 29.600450 27703.787686 20.653026 57.658355 243.716931 448.245700 32.277999 9.683363 27.815476 17.069322 71.697465 77.491397 78.321758 83.790031 347.767046 7.315994 6.781844
min 1.000000 70.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000
25% 21.000000 13766.500000 9.000000 28.000000 119.000000 223.750000 10.000000 3.000000 10.000000 4.000000 27.000000 45.000000 45.000000 36.500000 185.750000 2.000000 2.000000
50% 63.000000 48113.000000 31.000000 91.000000 411.000000 754.000000 29.000000 11.000000 37.000000 13.000000 98.500000 136.000000 141.000000 131.500000 618.500000 7.000000 7.000000
75% 78.000000 65117.000000 46.000000 132.000000 563.250000 1038.500000 50.000000 18.000000 57.000000 23.000000 151.000000 186.000000 185.000000 191.000000 799.500000 12.000000 12.000000
max 85.000000 104612.000000 94.000000 228.000000 987.000000 1779.000000 179.000000 50.000000 140.000000 104.000000 314.000000 311.000000 302.000000 318.000000 1471.000000 45.000000 35.000000

So how are these stats being compared? There’s a few things we have to do first. In order to reduce the impact of small sample sizes, we need to establish a metric that will make a player qualify for Defense Quality. Think quality starts in baseball, or minimum games for leaderboards. Defense Quality requires a player to average 10 minutes a game, and have played in at least half of the games that season. This ensures we are eliminating players that had great shifts in small roles, which will heavily skew the per 60 stats.

Also, the defensive role for forwards and defenseman is much different, so each stat is normalized by position, and will also be split accordingly later. For this exercise, all forwards will be treated as a forward, regardless of actual position. I understand there are different responsibilities, but we’ll see later how forwards might need to have some different weights to their formulation anyway.

To calculate Defense Quality we will take an approach very similar to my first blog about finding the NHL’s lucky charm and build a composite z-score. This will bring everything together onto the same scale, and allow us to compare to the average player much better. I wanted to use this blog to highlight various statistical techniques and how they can be applied to hockey analytics, but I believe a composite z-score is most appropriate for this case.

defense_5on5.loc[:, ['icetime_minutes']] = defense_5on5['icetime'] / 60

defense_10_minutes = defense_5on5[
    (defense_5on5['icetime_minutes'] / defense_5on5['games_played'] >= 10)
    & (defense_5on5['games_played'] >= 41)
]

defense_10_minutes.describe()
games_played icetime OnIce_A_highDangerShots OnIce_A_mediumDangerShots OnIce_A_lowDangerShots OnIce_A_shotAttempts shotsBlockedByPlayer I_F_takeaways I_F_giveaways I_F_dZoneGiveaways I_F_dZoneShiftStarts I_F_dZoneShiftEnds I_F_oZoneShiftEnds I_F_neutralZoneShiftEnds I_F_flyShiftEnds penalties penaltiesDrawn icetime_minutes
count 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000 560.000000
mean 72.032143 61838.539286 43.216071 124.976786 536.716071 991.246429 52.085714 17.625000 54.703571 25.301786 142.600000 175.857143 175.992857 179.337500 775.741071 11.814286 11.055357 1030.642321
std 10.909895 14681.047776 13.481642 34.357759 138.546995 250.912161 30.962436 7.817147 20.528735 17.048914 50.451386 42.152517 42.772686 49.356996 192.651991 6.755464 6.230700 244.684130
min 41.000000 26425.000000 12.000000 42.000000 203.000000 382.000000 8.000000 3.000000 15.000000 3.000000 23.000000 69.000000 65.000000 57.000000 290.000000 0.000000 0.000000 440.416667
25% 67.000000 52770.250000 34.000000 101.750000 446.000000 819.000000 29.000000 12.000000 39.000000 12.000000 107.000000 147.000000 146.000000 142.750000 646.750000 7.000000 6.750000 879.504167
50% 76.000000 61491.000000 43.000000 124.000000 533.000000 985.000000 42.500000 17.000000 52.000000 19.000000 140.500000 176.500000 178.000000 180.000000 768.000000 10.000000 10.000000 1024.850000
75% 81.000000 70570.500000 52.000000 147.000000 626.250000 1158.000000 69.000000 22.000000 67.000000 36.000000 172.250000 205.000000 205.000000 215.000000 889.500000 15.250000 15.000000 1176.175000
max 85.000000 104612.000000 94.000000 228.000000 987.000000 1779.000000 179.000000 50.000000 140.000000 104.000000 314.000000 311.000000 302.000000 318.000000 1471.000000 45.000000 35.000000 1743.533333
# Normalize all defensive stats by ice time (per 60 minutes)
defense_normalized = defense_10_minutes.copy()

# Convert ice time to hours for per-60 calculations
icetime_hours = defense_normalized['icetime'] / 3600

# Calculate ice time per game (will be normalized by position)
defense_normalized['icetime_per_game'] = defense_normalized['icetime'] / defense_normalized['games_played']

# Normalize all stats
stats_to_normalize = [stat for stat in defensive_stats if stat not in ['icetime', 'icetime_per_game', 'games_played', 'name', 'team', 'position']]

for stat in stats_to_normalize:
    defense_normalized[f'{stat}_per60'] = defense_normalized[stat] / icetime_hours

# Standardize the per60 stats BY POSITION (position-specific z-scores)
per60_stats = [f'{stat}_per60' for stat in stats_to_normalize] + ['icetime_per_game'] + ['name', 'team', 'position']

# Transform C, L, R positions to F (Forward)
defense_normalized['position'] = defense_normalized['position'].replace({'C': 'F', 'L': 'F', 'R': 'F'})

for stat in per60_stats:
    if stat not in ['name', 'team', 'position']:
        # Calculate position-specific mean and std, then standardize within position
        defense_normalized[stat] = defense_normalized.groupby('position')[stat].transform(
            lambda x: (x - x.mean()) / x.std()
        )

# Show the normalized data
defense_normalized.head()

name team position games_played icetime OnIce_A_highDangerShots OnIce_A_mediumDangerShots OnIce_A_lowDangerShots OnIce_A_shotAttempts shotsBlockedByPlayer ... I_F_takeaways_per60 I_F_giveaways_per60 I_F_dZoneGiveaways_per60 I_F_dZoneShiftStarts_per60 I_F_dZoneShiftEnds_per60 I_F_oZoneShiftEnds_per60 I_F_neutralZoneShiftEnds_per60 I_F_flyShiftEnds_per60 penalties_per60 penaltiesDrawn_per60
2 Michael Bunting NSH F 76 59813.0 47.0 127.0 481.0 923.0 17.0 ... -1.021651 0.177342 -0.889666 -1.388037 0.477589 0.411046 -0.691069 -0.119593 2.001072 1.887581
7 Ilya Lyubushkin DAL D 80 70786.0 49.0 165.0 637.0 1196.0 109.0 ... -0.466159 2.662645 2.187086 1.514781 -0.022019 0.425182 -0.953497 0.612857 0.235820 -0.016663
12 Carson Soucy NYR D 75 69895.0 48.0 137.0 643.0 1154.0 88.0 ... -0.434646 0.261487 0.277254 0.522278 0.185438 -0.079843 -0.510809 1.573115 0.682549 -0.901796
22 Ivan Barbashev VGK F 70 64523.0 52.0 127.0 607.0 1091.0 30.0 ... -0.003845 1.646416 1.226863 -0.490188 -0.524670 0.040595 -0.058717 -1.842489 -1.174090 -0.354278
27 Egor Zamula PHI D 63 56910.0 28.0 102.0 443.0 863.0 82.0 ... -0.586896 -0.392575 -0.122937 -0.940028 -0.096327 0.182298 0.653050 0.727374 -1.322950 -1.252205

5 rows × 37 columns

plt.figure(figsize=(10, 6))
sns.histplot(defense_normalized['OnIce_A_highDangerShots_per60'], bins=20, kde=True)
plt.title(f'Normalized Distribution of High Danger Shots Per 60')
plt.xlabel('High Danger Shots Per 60')
plt.show()

png

per60_df = defense_normalized[per60_stats]

per60_df.describe()

OnIce_A_highDangerShots_per60 OnIce_A_mediumDangerShots_per60 OnIce_A_lowDangerShots_per60 OnIce_A_shotAttempts_per60 shotsBlockedByPlayer_per60 I_F_takeaways_per60 I_F_giveaways_per60 I_F_dZoneGiveaways_per60 I_F_dZoneShiftStarts_per60 I_F_dZoneShiftEnds_per60 I_F_oZoneShiftEnds_per60 I_F_neutralZoneShiftEnds_per60 I_F_flyShiftEnds_per60 penalties_per60 penaltiesDrawn_per60 icetime_per_game
count 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02 5.600000e+02
mean -1.871519e-16 5.202188e-16 5.963484e-16 -1.122911e-15 -1.649474e-16 -1.173664e-16 -2.474211e-16 -4.472613e-16 -3.219647e-16 3.425831e-16 -4.821540e-16 -1.015061e-16 1.358437e-15 -3.489272e-17 1.205385e-16 6.978545e-17
std 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01 9.991051e-01
min -2.687816e+00 -2.668635e+00 -3.033788e+00 -3.046615e+00 -2.167089e+00 -2.255737e+00 -2.336807e+00 -2.624505e+00 -2.785129e+00 -2.850734e+00 -2.698309e+00 -3.282014e+00 -2.678271e+00 -1.702241e+00 -2.151523e+00 -2.902965e+00
25% -6.947883e-01 -6.852118e-01 -6.871908e-01 -6.537320e-01 -7.168265e-01 -7.020389e-01 -7.273986e-01 -6.928125e-01 -6.018277e-01 -7.048456e-01 -6.451592e-01 -7.043824e-01 -6.867701e-01 -7.070728e-01 -7.048873e-01 -7.599645e-01
50% -1.037711e-02 -7.539434e-03 7.647053e-03 -2.551437e-02 -6.326369e-02 -1.234383e-01 -3.107141e-03 -2.809440e-02 -6.925049e-02 2.578623e-02 3.835170e-02 8.386918e-04 -7.484519e-02 -1.946170e-01 -9.305307e-02 4.888170e-02
75% 6.706291e-01 6.180114e-01 6.046613e-01 7.045182e-01 5.889090e-01 6.210524e-01 6.316330e-01 6.516388e-01 5.267702e-01 6.612892e-01 7.017442e-01 6.527776e-01 6.423752e-01 4.451587e-01 5.574157e-01 7.360586e-01
max 3.744299e+00 3.772877e+00 4.038967e+00 3.969688e+00 4.371575e+00 3.501708e+00 3.405730e+00 4.421493e+00 4.322669e+00 2.840199e+00 2.574751e+00 3.228487e+00 3.746914e+00 4.415791e+00 4.170701e+00 3.302208e+00
# All stats needed for Defensive Rating calculation
defensive_weights = {
    # Ice time responsibility (higher is better - rewards players who play more)
    'icetime_per_game': 5,
    
    # Shot Suppression (Lower is better)
    'OnIce_A_highDangerShots_per60': -15,
    'OnIce_A_mediumDangerShots_per60': -10,
    'OnIce_A_lowDangerShots_per60': -1,
    'OnIce_A_shotAttempts_per60': -1,
    
    # Physical Defense (Higher is better)
    'shotsBlockedByPlayer_per60': 15,
    
    # Puck Management (Lower is better)
    'I_F_takeaways_per60': 5,
    'I_F_giveaways_per60': -5,
    'I_F_dZoneGiveaways_per60': -10,
    
    # Shift Quality (Higher is better, except defense ends)
    'I_F_dZoneShiftStarts_per60': 5,
    'I_F_dZoneShiftEnds_per60': -5,
    'I_F_oZoneShiftEnds_per60': 5,
    'I_F_neutralZoneShiftEnds_per60': 1,
    'I_F_flyShiftEnds_per60': 1,
    
    # Penalty Differential (Lower is better)
    'penalties_per60': -15,
    'penaltiesDrawn_per60': 1,
}

Mathematical Formula

Defense Quality is calculated as a weighted composite z-score:

\[DQ = \sum_{i=1}^{n} \frac{w_i}{\sum_{j=1}^{n}|w_j|} \cdot z_i\]

Where:

The z-scores are calculated separately by position (Forward vs Defenseman):

\[z_i = \frac{x_i - \mu_{\text{position}}}{\sigma_{\text{position}}}\]

Where $x_i$ is the per-60 rate statistic, $\mu_{\text{position}}$ is the position-specific mean, and $\sigma_{\text{position}}$ is the position-specific standard deviation.

s = sum(np.abs(defensive_weights[stat]) for stat in defensive_weights.keys())
defense_quality = sum((defensive_weights[stat] / s) * per60_df[stat] for stat in defensive_weights.keys())

per60_df.loc[:, ['defense_quality']] = defense_quality

# Add raw ice time data for reference
per60_df.loc[:, ['icetime']] = defense_normalized['icetime']
per60_df.loc[:, ['games_played']] = defense_normalized['games_played']
per60_df.loc[:, ['icetime_per_game']] = defense_normalized['icetime_per_game']

plt.figure(figsize=(10, 6))

sns.histplot(per60_df['defense_quality'], bins=20, kde=True)
plt.title('Distribution of Defense Quality')
plt.xlabel('Defense Quality')

plt.tight_layout()
plt.show()

png

Results and Interpretation

We finally have our new statistic! Whew, this was an intense one but we’re finally at the fun part! If we look at the distribution of forwards and defensemen, we can see they’re both mostly normal with a very slight skew left. This is to be expected, most players are average defensively, some great, some awful. On a macro level, it looks good! Let’s dive deep to see if it’s accurate.

defense_dq = per60_df[per60_df['position'] == 'D']
forward_dq = per60_df[per60_df['position'] == 'F']
plt.figure(figsize=(10, 6))

plt.subplot(1, 2, 1)
sns.histplot(defense_dq['defense_quality'], bins=20, kde=True)
plt.title('Distribution of Defense Quality (Defensemen)')

plt.subplot(1, 2, 2)
sns.histplot(forward_dq['defense_quality'], bins=20, kde=True)
plt.title('Distribution of Defense Quality (Forwards)')

plt.tight_layout()
plt.show()

png

The defensive top 10 looks very accurate. Spurgeon is an absolute lock down defenseman, and right behind him is the LA King’s top pairing. The only surprise to me in the top 10 is Rasmus Ristolainen. He’s had an up and down career, but looking at the numbers, he had a great year last season (+3, 94 BLK, 25 TAKE) that was unfortunately derailed by injuries.

display_stats = ['name', 'team', 'position', 'defense_quality']
d_top_10 = defense_dq[display_stats].sort_values(by='defense_quality', ascending=False).head(10).reset_index(drop=True)
d_top_10.index += 1

d_top_10.round(3)
name team position defense_quality
1 Jared Spurgeon MIN D 0.970
2 Vladislav Gavrikov LAK D 0.865
3 Mikey Anderson LAK D 0.812
4 Chris Tanev TOR D 0.800
5 Artem Zub OTT D 0.797
6 Rasmus Ristolainen PHI D 0.744
7 Colton Parayko STL D 0.714
8 Jaccob Slavin CAR D 0.687
9 Jake Sanderson OTT D 0.655
10 Adam Pelech NYI D 0.620

Remember how I said the weights might need to be adjusted for forwards? While the defensive metric is great for defensemen, it fails to accurately assess defense for forwards in my opinion. The list is mainly third line players, generally whose role is to be good defensivley and shut down an opposing team’s top line. This makes sense why players such as Logan O’Connor are at the top. In that sense, it’s good even if I don’t agree with the top ranking. However, I can’t help but feel like Sam Reinhart (12) should not be above Aleksander Barkov (16). There’s other questionable rankings in here, but overall I think it’s a solid foundation that can (and should) be built upon for forwards.

f_top_10 = forward_dq[display_stats].sort_values(by='defense_quality', ascending=False).head(10).reset_index(drop=True)
f_top_10.index += 1

f_top_10.round(3)
name team position defense_quality
1 Logan O'Connor COL F 1.101
2 Noel Acciari PIT F 1.063
3 Elias Pettersson VAN F 1.018
4 Colton Sissons NSH F 0.972
5 Parker Kelly COL F 0.841
6 Joel Kiviranta COL F 0.802
7 Anthony Cirelli TBL F 0.793
8 Adam Lowry WPG F 0.787
9 Ryan Poehling PHI F 0.779
10 Alexander Wennberg SJS F 0.734

The biggest issue with this raw Defense Quality stat, is it’s interpretability. Defense Rating is in goals, WAR is in wins, but Defense Quality is in standard deviations of the mean. That’s neither sexy, nor is it meaningful without knowing the data. Baseball has had a solution for this forever, the plus statistic! We can transform Defense Quality as is, to Defense Quality Plus, so that it is now measured in “percent of the average.” The numbers are nearly identical since our average was at 0, and we’re just shifting it to 100. This means “Spurgeon has 97% better defense than the average player,” or “Spurgeon gives up 97% less quality offense for the opposing team than average.” This plus version is what I propose to be the default way of showing Defense Quality, much like how the default way of showing PDO is the transformed statistic, rather than raw sum percentage.

dq_plus = 100 + defense_dq['defense_quality'] * 100
defense_dq.loc[:, ['dq_plus']] = dq_plus / np.mean(dq_plus) * 100

dq_plus = 100 + forward_dq['defense_quality'] * 100
forward_dq.loc[:, ['dq_plus']] = dq_plus / np.mean(dq_plus) * 100

top10_d = defense_dq[display_stats + ['dq_plus']].sort_values(by='dq_plus', ascending=False).head(10).reset_index(drop=True)
top10_d.index += 1

top10_d.round(3)

name team position defense_quality dq_plus
1 Jared Spurgeon MIN D 0.970 197.041
2 Vladislav Gavrikov LAK D 0.865 186.464
3 Mikey Anderson LAK D 0.812 181.166
4 Chris Tanev TOR D 0.800 180.031
5 Artem Zub OTT D 0.797 179.718
6 Rasmus Ristolainen PHI D 0.744 174.398
7 Colton Parayko STL D 0.714 171.409
8 Jaccob Slavin CAR D 0.687 168.713
9 Jake Sanderson OTT D 0.655 165.462
10 Adam Pelech NYI D 0.620 161.999

The final Defense Quality Plus (DQ+) metric transforms DQ to a percentage scale:

\[DQ+ = \frac{100 + (DQ \times 100)}{\overline{DQ+}} \times 100\]

Where $\overline{DQ+}$ is the mean of the transformed scores, centering the league average at 100.

When we graph DQ+ we can easily highlight elite vs above average defense. Since this is a normal distribution, we can focus on the standard deviations. Anything 2 standard deviations (around 76%) is considered elite. This limits us to only the top 5 as well in this case. Also, anything below 62% can be considered poor defense that needs improvement.

# Visualize the DR+ distribution
std = defense_dq['dq_plus'].std()
mean = defense_dq['dq_plus'].mean()

plt.figure(figsize=(10, 6))
sns.histplot(defense_dq['dq_plus'], bins=20, kde=True)
plt.axvline(mean, color='red', linestyle='--', label=f'Average ({mean:.2f})')
plt.axvline(mean + std, color='orange', linestyle='--', alpha=0.5, label=f'+1 Std Dev ({mean + std:.2f})')
plt.axvline(mean + 2 * std, color='green', linestyle='--', alpha=0.5, label=f'+2 Std Dev ({mean + 2 * std:.2f})')
plt.axvline(mean - std, color='orange', linestyle='--', alpha=0.5, label=f'-1 Std Dev ({mean - std:.2f})')
plt.title('Distribution of DQ+ For Defensemen')
plt.xlabel(f'DQ+ ({mean:.2f} = League Average For Defensemen)')
plt.legend()
plt.show()

png

Although not the intention of the metric, it can be used to measure a team’s defense pretty well. If we take the average of a team’s players’ DQ+, we get a pretty good understanding on a team’s overall defense that does tend to line up well. It’s not perfect because the forward DQ+ isn’t, but it’s not a bad list. Again, this isn’t the intention of the stat, it just lines up well and shows more ways it can be used.

team_dq = pd.concat([defense_dq, forward_dq])
team_dq = team_dq.groupby('team')['dq_plus'].mean().sort_values(ascending=False).reset_index()
top10_team_dq = team_dq.head(10)

top10_team_dq.round(3)
team dq_plus
0 PHI 128.909
1 LAK 128.790
2 WPG 123.566
3 COL 120.687
4 STL 120.397
5 VAN 120.346
6 MIN 115.182
7 EDM 115.108
8 CGY 112.099
9 VGK 110.477

Overall, Defense Quality provides an excellent way to compare defensemen, and understand their defensive impact on the quality of offense an opposing team gets while they’re on the ice. It is a very good stat to pair with Defensive Rating, so that one can understand that results of a player’s defense, and the reasons behind it. Since Defense Quality directly compares players in the same position, it is able to be quickly understood as to what is “good” vs “bad” Defense Quality without needing a deep understanding of the statistic.

Thanks For Reading!

If you made it this far, thanks! I want to keep this blog pretty light hearted and fun, more like my first two, but I thought it would be interesting to go deep into trying to fill a hole that exists. I’ve also started an X account in case anyone is interested, but I’m not sure how much I’ll be using it. Would love to hear your thoughts on where else this stat can be useful! Also, let me know if you more preferred the previous ones aimed at being more fun, or this one that aimed at providing a real solution.