What’s in my database

I am going to be regularly referring to data pulled from my database so I thought it would be useful to summarise what matches are covered by the database, and what data I have.

This data hasn’t been updated, but you can read the most up-to-date version here.

The database was last updated on 2 December. I will make regular updates. This last update was before the start of the WBBL, but includes parts of the domestic competitions in Bangladesh, South Africa and New Zealand.

Continue reading “What’s in my database”


Welcome to Strike Rate

Welcome to my new blog, which will be covering the statistics of Twenty20 cricket over this summer.

Last summer, during the Big Bash League, I became interested in using stats to give myself a sense of how likely a team was to win. Twenty20 cricket is a new game, and I didn’t have a good sense of what sort of score gave you a very good chance of winning.

I built a small dataset, just of the topline results from the BBL and its state-based predecessor the Big Bash over the last ten years, and used that to post graphs during games that were underway.

(You can follow this blog on Twitter, where I may well do similar analysis this summer).

This time I’d like to go much bigger, providing analysis about specific matches based on a much larger dataset, and trying to answer broader questions about how the game works.

This is a tentative first step, and initially I’m going to focus on describing the game as it exists, not trying to make predictions of likely results. I’d appreciate any feedback on statistical methods which might help me make stronger conclusions, or about particular questions you’d like answered. At some point I’ll work on producing guides to some upcoming matches, and I’d also appreciate feedback on which bits of information are the most useful.

I’m going to focus entirely on Twenty20 cricket, both men’s and women’s. I’ve built a database which covers all men’s domestic and international Twenty20, women’s internationals and selected women’s domestic Twenty20 competitions (the data gets a bit sketchy here). Next week I’ll do a post outlining what I’ve included and how much data there is, but there’s over 6000 matches worth of data in the database.

I was planning more of an introduction explaining what I was doing, but I’m going to come back to that introduction next week, and keep this short.

The inaugural Women’s Big Bash League starts this Saturday, with three games played over the weekend (I’ll be at the Sydney derby in Penrith on Sunday).

Because of this impending deadline, I’m going to prioritise a couple of posts explaining the main statistical differences between men and women’s cricket, for anyone who is interested in women’s cricket but doesn’t understand (for example) what kind of score would likely mean a team is in a winning position. Then I will follow that up with an analysis of the stats for those players signed to play in the WBBL.

Next week, I’ll start doing some analysis building up to the Big Bash League. I’m expecting the blog will be a mix of pre-analysis of games coming up, analysis of ongoing competitions, and analysis of deeper questions (how does the proportion of runs scored in fours and sixes affect the team’s likelihood of victory?) which should apply more broadly across the game.