Use case development and prototyping of a solution using NFL’s NextGen player tracking data to generate valuable scouting and coaching insights for NFL teams.
Description
Note: This was a team hackathon submission.
Compared to quarterbacks or wide receivers, the specific traits that mark top-rated offensive linemen in the NFL are not as well-quantified. Using NFL’s NextGen dataset, which features in-game positional (xyz) data in 0.1s increments, our hackathon team aimed to quantify the relative importance of these traits to offensive lineman performance across the NFL, allowing coaches and scouts to adopt a more rigorous approach in filling their o-line. This analysis consisted of four main components:
- Create ETL and cleaning process on massive, denormalized NFL NextGen dataset stored in AWS S3 to obtain data in usable form.
- Develop offensive lineman rating based on how often their mark was able to beat them and apply pressure to the QB.
- Calculate key performance indicators such as acceleration off-snap, contact angle with defenders, velocity after contact, etc.
- Standardize features and fit to a linear regression model to predict o-lineman rating. After achieving desired accuracy, rank feature weights by absolute value to determine the most important predictors of offensive lineman rating.
My Contribution
For this project, I independently developed the first and second components outlined above. To create our offensive lineman rating, I wrote an algorithm to segment tracking data into individual plays and search for instances where a defender was able to beat a particular o-lineman and apply pressure to the quarterback.
Code featured on my Github repo only contains code which I independently wrote and contributed, as well as data samples to illustrate the inputs and outputs of the ETL process I created.
Sample code for the algorithm I wrote to capture offensive lineman rating is posted below:
# Iterate through plays in context_df and extract block win rate for each offensive lineman
counter = 0
for index, play in context_df.iterrows():
o_play = offense_df[offense_df['ballsnaptime']==play['ballsnaptime']]
d_play = defense_df[defense_df['ballsnaptime']==play['ballsnaptime']]
# Only first 2.5 seconds of play are relevant for our analysis. Note 2.5 seconds = 25 increments of time
timesteps = o_play.groupby('time', as_index=False).first()['time']
timesteps = timesteps[:25]
o_play.loc[:,'x'] -= 10
d_play.loc[:,'x'] -= 10
# Get starting position of QB and Center to orient the game correctly
try:
start_center_pos = o_play[(o_play['position']=='C')&(o_play['time']==timesteps[0])]['x'].tolist()[0]
start_qb_pos = o_play[(o_play['position']=='QB')&(o_play['time']==timesteps[0])]['x'].tolist()[0]
except:
continue
# Determine direction of offensive play
if start_center_pos > start_qb_pos:
left_to_right = True
else:
left_to_right = False
# Determine absolute x of LOS
if start_center_pos < 50:
los_x = play['los']
else:
los_x = 100 - play['los']
# Initialize passrushers list: define as defenders that have crossed the line of scrimmage by the end of obs. window
passrushers = []
for index, player in d_play.iterrows():
# Define 'crossing LOS' depending on which direction the offense is going
if left_to_right == True:
crossed_los = (player['x'] <= los_x)
else:
crossed_los = (player['x'] >= los_x)
if crossed_los and player['player'] not in passrushers:
passrushers.append(player['player'])
if len(passrushers) < 4:
print('Faulty passrushers encountered')
continue
# Initialize assignments df - get all players except for QB
assignments = o_play.groupby('player', as_index=False).first()[['player', 'position']]
assignments = assignments[assignments['position'] != 'QB']
assignments['assignment'] = 'None'
# Dynamically assign coverage using distance and orientation wrt each pass rusher at each time step
# playerlosstracker will reflect 0 for a player if player was never beaten by his assignment
playerlosstracker = {}
# Every time the below for loop is run,
# 1. Increment 'observations' for each player observed by 1
# 2. Reset playerlosstracker to 0 to observe the new play
for index, row in assignments.iterrows():
obs_gain = olineplayerstats[olineplayerstats.index==row['player']]['observations']+1
olineplayerstats.at[row['player'], 'observations'] = obs_gain
playerlosstracker[row['player']] = 0
for time in timesteps:
# Set a Boolean flag - if continue1 conditions are met, continue1 is set to True and all loops within this are exited, effectively skipping to the next time step. This is used to deal with missing data
continue1 = False
# Data to analyze in each iteration
d_step = d_play[d_play['time']==time]
o_step = o_play[o_play['time']==time]
# Get QB Position
qb_xy = np.array([o_step[o_step['position']=='QB']['x'].iloc[0],o_step[o_step['position']=='QB']['y'].iloc[0]])
# Loop through each player, get their x,y coordinates, and compare to each pass rusher
for index, row in assignments.iterrows():
try:
ol_xy = np.array([o_step[o_step['player']==row['player']]['x'].iloc[0], o_step[o_step['player']==row['player']]['y'].iloc[0]])
# Bad programming solution to the problem of missing data for time's sake, but using this escape flag to skip time step if data is missing for a player
except:
continue1 = True
continue
distances_to_rushers = {}
for name in passrushers:
d_xy = np.array([d_step[d_step['player']==name]['x'].iloc[0], d_step[d_step['player']==name]['y'].iloc[0]])
distance = np.linalg.norm(d_xy-ol_xy)
distances_to_rushers.update({name: distance})
assignment = min(distances_to_rushers, key=distances_to_rushers.get)
row['assignment'] = assignment
# Track coordinates of assignment specifically
asn_xy = np.array([d_step[d_step['player']==assignment]['x'].iloc[0], d_step[d_step['player']==assignment]['y'].iloc[0]])
# Key calculation: if assignment is ever closer to QB than o-lineman and < 3 yards away in 2.5 s, set Boolean win = False
if np.linalg.norm(asn_xy-qb_xy) < np.linalg.norm(ol_xy-qb_xy) and np.linalg.norm(asn_xy-qb_xy) < 4:
playerlosstracker[row['player']] += 1
if continue1 == True:
continue
for player in playerlosstracker:
if playerlosstracker[player] == 0:
win_gain = olineplayerstats.loc[player, 'blockwins'] + 1
olineplayerstats.at[player, 'blockwins'] = win_gain
counter+=1
olineplayerstats['winrate'] = olineplayerstats['blockwins']/olineplayerstats['observations']