Harsha Perera

Cricket Analytics

This thesis consists of a compilation of three research papers and a non-statistical essay.

Chapter 2 considers the decision problem of when to declare during the third innings of a test cricket match. There are various factors that affect the decision of the declaring team including the target score, the number of overs remaining, the relative desire to win versus draw, and the scoring characteristics of the particular match. Decision rules are developed and these are assessed against historical matches. We observe that there are discrepancies between the optimal time to declare and what takes place in practice.

Chapter 3 considers the determination of optimal team lineups in Twenty20 cricket where a lineup consists of three components: team selection, batting order and bowling order. Via match simulation, we estimate the expected runs scored minus the expected runs allowed for a given lineup. The lineup is then optimized over a vast combinatorial space via simulated annealing. We observe that the composition of an optimal Twenty20 lineup sometimes results in nontraditional roles for players. As a by-product of the methodology, we obtain an “all-star” lineup selected from international Twenty20 cricket.

Chapter 4 is a first attempt to investigate the importance of fielding in cricket. We introduce the metric of expected runs saved due to fielding which is both interpretable and is directly relevant to winning matches. The metric is assigned to individual players and is based on a textual analysis of match commentaries using random forest methodology. We observe that the best fielders save on average 1.2 runs per match compared to a typical fielder.

Chapter 5 is a non-statistical essay of two cricketing greats from Sri Lanka who established numerous world records and recently retired from the game. Though their record-breaking performances are now part of cricketing statistics, this chapter is not a contribution which adds to the statistical literature, and should not be regarded as a component of the thesis in terms of analytics.

Keywords: cricket; decision rules; Gibbs sampling; parameter estimation; Random forests; Relative value statistics; Simulated annealing; Simulation; Textual analysis; Twenty20 cricket