NFL Offensive Lineman Analysis

NFL Team Hackathon Submission

Use case development and prototyping of a solution using NFL’s NextGen player tracking data to generate valuable scouting and coaching insights for NFL teams.

Description

Note: This was a team hackathon submission.

Compared to quarterbacks or wide receivers, the specific traits that mark top-rated offensive linemen in the NFL are not as well-quantified. Using NFL’s NextGen dataset, which features in-game positional (xyz) data in 0.1s increments, our hackathon team aimed to quantify the relative importance of these traits to offensive lineman performance across the NFL, allowing coaches and scouts to adopt a more rigorous approach in filling their o-line. This analysis consisted of four main components:

  1. Create ETL and cleaning process on massive, denormalized NFL NextGen dataset stored in AWS S3 to obtain data in usable form.
  2. Develop offensive lineman rating based on how often their mark was able to beat them and apply pressure to the QB.
  3. Calculate key performance indicators such as acceleration off-snap, contact angle with defenders, velocity after contact, etc.
  4. Standardize features and fit to a linear regression model to predict o-lineman rating. After achieving desired accuracy, rank feature weights by absolute value to determine the most important predictors of offensive lineman rating.

My Contribution

For this project, I independently developed the first and second components outlined above. To create our offensive lineman rating, I wrote an algorithm to segment tracking data into individual plays and search for instances where a defender was able to beat a particular o-lineman and apply pressure to the quarterback.

Code featured on my Github repo only contains code which I independently wrote and contributed, as well as data samples to illustrate the inputs and outputs of the ETL process I created.

Sample code for the algorithm I wrote to capture offensive lineman rating is posted below:

    # Iterate through plays in context_df and extract block win rate for each offensive lineman
    counter = 0
    for index, play in context_df.iterrows():
        o_play = offense_df[offense_df['ballsnaptime']==play['ballsnaptime']]
        d_play = defense_df[defense_df['ballsnaptime']==play['ballsnaptime']]

        # Only first 2.5 seconds of play are relevant for our analysis. Note 2.5 seconds = 25 increments of time
        timesteps = o_play.groupby('time', as_index=False).first()['time']
        timesteps = timesteps[:25]
            
        o_play.loc[:,'x'] -= 10
        d_play.loc[:,'x'] -= 10
        
        # Get starting position of QB and Center to orient the game correctly
        try:
            start_center_pos = o_play[(o_play['position']=='C')&(o_play['time']==timesteps[0])]['x'].tolist()[0]
            start_qb_pos = o_play[(o_play['position']=='QB')&(o_play['time']==timesteps[0])]['x'].tolist()[0]
        except:
            continue
        
        # Determine direction of offensive play
        if start_center_pos > start_qb_pos:
            left_to_right = True
        else:
            left_to_right = False
        
        # Determine absolute x of LOS
        if start_center_pos < 50:
            los_x = play['los']
        else:
            los_x = 100 - play['los']

        # Initialize passrushers list: define as defenders that have crossed the line of scrimmage by the end of obs. window
        passrushers = []
        for index, player in d_play.iterrows():
            # Define 'crossing LOS' depending on which direction the offense is going
            if left_to_right == True:
                crossed_los = (player['x'] <= los_x)
            else:
                crossed_los = (player['x'] >= los_x)
            
            if crossed_los and player['player'] not in passrushers:
                passrushers.append(player['player'])
        
        if len(passrushers) < 4:
            print('Faulty passrushers encountered')
            continue
                
        # Initialize assignments df - get all players except for QB
        assignments = o_play.groupby('player', as_index=False).first()[['player', 'position']]
        assignments = assignments[assignments['position'] != 'QB']
        assignments['assignment'] = 'None'
        
        # Dynamically assign coverage using distance and orientation wrt each pass rusher at each time step
        # playerlosstracker will reflect 0 for a player if player was never beaten by his assignment
        playerlosstracker = {}

        # Every time the below for loop is run, 
        # 1. Increment 'observations' for each player observed by 1
        # 2. Reset playerlosstracker to 0 to observe the new play
        for index, row in assignments.iterrows():
            obs_gain = olineplayerstats[olineplayerstats.index==row['player']]['observations']+1
            olineplayerstats.at[row['player'], 'observations'] = obs_gain
            playerlosstracker[row['player']] = 0
        
        for time in timesteps:
            
            # Set a Boolean flag - if continue1 conditions are met, continue1 is set to True and all loops within this are exited, effectively skipping to the next time step. This is used to deal with missing data
            continue1 = False
            
            # Data to analyze in each iteration
            d_step = d_play[d_play['time']==time]
            o_step = o_play[o_play['time']==time]
                        
            # Get QB Position
            qb_xy = np.array([o_step[o_step['position']=='QB']['x'].iloc[0],o_step[o_step['position']=='QB']['y'].iloc[0]])
            
            # Loop through each player, get their x,y coordinates, and compare to each pass rusher
            for index, row in assignments.iterrows():
                try:
                    ol_xy = np.array([o_step[o_step['player']==row['player']]['x'].iloc[0], o_step[o_step['player']==row['player']]['y'].iloc[0]]) 
                
                # Bad programming solution to the problem of missing data for time's sake, but using this escape flag to skip time step if data is missing for a player
                except:
                    continue1 = True
                    continue
                
                distances_to_rushers = {}

                for name in passrushers:
                    d_xy = np.array([d_step[d_step['player']==name]['x'].iloc[0], d_step[d_step['player']==name]['y'].iloc[0]])
                    distance = np.linalg.norm(d_xy-ol_xy)
                    distances_to_rushers.update({name: distance})
                
                assignment = min(distances_to_rushers, key=distances_to_rushers.get)
                row['assignment'] = assignment

                # Track coordinates of assignment specifically
                asn_xy = np.array([d_step[d_step['player']==assignment]['x'].iloc[0], d_step[d_step['player']==assignment]['y'].iloc[0]])

                # Key calculation: if assignment is ever closer to QB than o-lineman and < 3 yards away in 2.5 s, set Boolean win = False
                if np.linalg.norm(asn_xy-qb_xy) < np.linalg.norm(ol_xy-qb_xy) and np.linalg.norm(asn_xy-qb_xy) < 4:
                    playerlosstracker[row['player']] += 1 
            
            if continue1 == True:
                continue
        
        for player in playerlosstracker:
            if playerlosstracker[player] == 0:
                win_gain = olineplayerstats.loc[player, 'blockwins'] + 1
                olineplayerstats.at[player, 'blockwins'] = win_gain
        counter+=1    

    olineplayerstats['winrate'] = olineplayerstats['blockwins']/olineplayerstats['observations']