Jacob Mortensen

Statistical methods for tracking data in sports.

In this thesis, we examine player tracking data in basketball and soccer and explore statistical methods and applications related to this type of data.

First, we present a method for nonparametric estimation of continuous-state Markov transition densities, using as our foundation a Poisson process representation of the joint input-output space of the Markovian transitions. Modeling a transition density as a point process creates a general framework that admits a variety of implementations and includes some historical methods for nonparametric transition density estimation as a special case. Representing transition densities with a nonstationary point process allows the form of the transition density to vary rapidly over the space, resulting in a very flexible estimator of the transition mechanism. A key feature of this point process representation is that it allows the presence of spatial structure to inform transition density estimation. We illustrate this by using our method to model ball movement in the National Basketball Association, enabling us to capture the effects of spatial features, such as the three point line, that impact transition density values.

Next, we consider a sports science application. Sports science has seen substantial benefit from player tracking data, as high resolution coordinate data permits sports scientists to have to-the-second estimates of external load metrics, such as acceleration load and high speed running distance, traditionally used to understand the physical toll a game takes on an athlete. Unfortunately, collecting this data requires installation of expensive hardware and paying costly licensing fees to data providers, restricting its availability. Algorithms have been developed that allow a traditional broadcast feed to be converted to x-y coordinate data, making tracking data easier to acquire, but coordinates are available for an athlete only when that player is within the camera frame. Obviously, this leads to inaccuracies in player load estimates, limiting the usefulness of this data for sports scientists. In this research, we develop models that predict offscreen load metrics and demonstrate the viability of broadcast-derived tracking data for understanding external load in soccer.

Finally, we address a tactics question in soccer. A key piece of information when evaluating a matchup in soccer is understanding the formations utilized by the different teams. For example, a weaker team playing against a stronger opponent may choose to play a highly defensive formation with five defenders, four midfielders, and one attacker, commonly referred to as a 5-4-1. Multiple researchers have developed methodology for learning these formations from tracking data, but they do not work when faced with the heavy censoring inherent to broadcast tracking data. We present an algorithm for aligning broadcast tracking data with the origin, and then show how the aligned data can be used to learn formations, with performance comparable to formations learned from the full tracking data.

Keywords: tracking data; Poisson point process; Markov model; broadcast tracking data; mixture model.