I am going to be regularly referring to data pulled from my database so I thought it would be useful to summarise what matches are covered by the database, and what data I have.
This data hasn’t been updated, but you can read the most up-to-date version here.
The database was last updated on 2 December. I will make regular updates. This last update was before the start of the WBBL, but includes parts of the domestic competitions in Bangladesh, South Africa and New Zealand.
The database currently contains data for 6225 Twenty20 matches played since the format was introduced in 2003. The vast majority produced a clear result. 163 had no result at all, while there were 109 ties. 64 of those were resolved with a super-over, 12 were resolved with a bowl-out (a previous method of breaking a tie) and 33 were other ties – either there was no tie-breaker, or it isn’t recorded in the data.
|Match type||# of matches|
I have collected all matches which are classified as ‘Twenty20 International’ for either men or women, or ‘Twenty20’ for men. This doesn’t cover all matches played by Twenty20 rules – only certain teams qualify for particular statuses.
For Twenty20 Internationals (at least for men), only matches between the test nations and Scotland, Ireland, Afghanistan, Netherlands, Hong Kong, United Arab Emirates, Papua New Guinea and Oman count as T20Is. Similar rules apply for women. If one of these countries plays a country without status – for example Afghanistan vs Nepal – that match is not a T20I.
The status ‘Twenty20’ is the short-form equivalent of ‘first-class’ for multi-day matches (eg. Sheffield Shield) or List A for the domestic equivalent of one-day internationals. It applies to the top-level domestic competitions in each test-playing country, but it also applies to matches that don’t count as T20Is in major ICC tournaments. At the World Twenty20 Qualifier, some matches were T20Is and others were Twenty20 matches.
For women, there is no equivalent Twenty20 status for top-level domestic leagues, so I have had to identify individual tournaments and collect those data. I have included the following countries and years:
- Australia 2009-10 to 2014-15
- New Zealand 2007-08 to 2015-16
- Pakistan 2008-09 and 2012
- South Africa 2008 and 2015-16
- Sri Lanka 2009-10 and 2015-16
- West Indies 2012
- ICC Americas 2010 and 2012
- ICC Europe 2012 and 2014
- Women’s World Twenty20 Qualifier 2013 and 2015
Interestingly I couldn’t find any data for domestic women’s tournaments in England or India.
I have defined all of these matches as ‘Women’s Twenty20’ and I will usually include them in player statistics and match statistics. Unlike the men’s game, there is a big sample bias towards three countries: Australia, New Zealand and Sri Lanka, so often I will decide to limit my sample to what is appropriate.
Here are the number of matches I have for each format:
|Match type||# of matches|
Another way to look at the data is the number of matches I have per country.
|United Arab Emirates||223||8|
The SQL database consists of a number of tables which contain different data. These include matches, innings, batting stats, bowling stats, fall of wickets data, score by over, partnerships, player vs player data tables, as well as tables with data on players, teams, each series, and venues. The batting stats table, in addition to telling you how many runs were scored and how many balls were faced for each batsman, also doubles as a list of which players were in each match, so there’s a row for players who didn’t bat.
I have data on every innings played in each match, and I have batting and bowling stats for all matches except those which were abandoned before they started. Likewise, the fall of wickets and partnerships data is reasonably complete, although in some cases data is missing on which players were in each partnership and were the ones to lose their wicket at a particular point.
The database also contains information on the score at the end of each over, but this only exists for 3840 matches. I also have data on the batting and bowling stats broken down by each combination of batsman and bowler, but only for about 2989 matches. These are still substantial samples, but it’s important to check to ensure you are using comparable data when using these in any analysis.
I also have a table with personal information on 7170 players, including their name, their nationality,their gender (derived from which games they have played in) as well as their batting and bowling style. For some players I also have data on their birthdate and height, although this isn’t currently in a usable format. This table includes 5690 male players and 1480 female players.
I also have data on each series played. A ‘series’ can be a major tournament like the Big Bash League or the World Twenty20, or just a tour or a single match played between two countries. Each series represents only one season’s worth of play, so the Big Bash League appears in the table four times. I have a field called ‘multi_year_series’ which can be used to group together tournaments which happen in multiple years. The table currently has data on 463 series.
I have a table with data on 406 teams. This data includes the gender of the team (the Australian men’s team and women’s team are treated separately), and in some cases I have specifically filled in the geographic name of the team (‘Adelaide’) and the nickname (‘Strikers’), and the home country for every team.
Finally, the table contains data on 346 venues. This includes the country that the venue is in. By joining this table to the matches table, you can analyse match data by host country.