What’s in the database

I am going to be regularly referring to data pulled from my database so I thought it would be useful to summarise what matches are covered by the database, and what data I have.

I will post this as a static page at some point and update that (and link to it from here) but I won’t update this page.

The database was last updated on 30 December. I will make regular updates.

The database currently contains data for 6326 Twenty20 matches played since the format was introduced in 2003. The vast majority produced a clear result. 163 had no result at all, while there were 109 ties. 64 of those were resolved with a super-over, 12 were resolved with a bowl-out (a previous method of breaking a tie) and 33 were other ties – either there was no tie-breaker, or it isn’t recorded in the data.

Match type # of matches
Clear result 6048
No result 168
Super over 65
Other tie 33
Bowl-out 12

I have collected all matches which are classified as ‘Twenty20 International’ for either men or women, or ‘Twenty20’ for men. This doesn’t cover all matches played by Twenty20 rules – only certain teams qualify for particular statuses.

For Twenty20 Internationals (at least for men), only matches between the test nations and Scotland, Ireland, Afghanistan, Netherlands, Hong Kong, United Arab Emirates, Papua New Guinea and Oman count as T20Is. Similar rules apply for women. If one of these countries plays a country without status – for example Afghanistan vs Nepal – that match is not a T20I.

The status ‘Twenty20’ is the short-form equivalent of ‘first-class’ for multi-day matches (eg. Sheffield Shield) or List A for the domestic equivalent of one-day internationals. It applies to the top-level domestic competitions in each test-playing country, but it also applies to matches that don’t count as T20Is in major ICC tournaments. At the World Twenty20 Qualifier, some matches were T20Is and others were Twenty20 matches.

For women, there is no equivalent Twenty20 status for top-level domestic leagues, so I have had to identify individual tournaments and collect those data. I have included the following countries and years:

  • Australia 2009-10 to 2015-16
  • New Zealand 2007-08 to 2015-16
  • Pakistan 2008-09 and 2012
  • South Africa 2008 and 2015-16
  • Sri Lanka 2009-10 and 2015-16
  • West Indies 2012
  • ICC Americas 2010 and 2012
  • ICC Europe 2012 and 2014
  • Women’s World Twenty20 Qualifier 2013 and 2015

Interestingly I couldn’t find any data for domestic women’s tournaments in England or India. I understand there is data in places like CricketArchive but my current system can’t capture those.

I have defined all of these matches as ‘Women’s Twenty20’ and I will usually include them in player statistics and match statistics. Unlike the men’s game, there is a big sample bias towards three countries: Australia, New Zealand and Sri Lanka, so often I will decide to limit my sample to what is appropriate.

Here are the number of matches I have for each format:

Match type # of matches
Men’s T20I 473
Men’s Twenty20 4883
Women’s T20I 325
Women’s Twenty20 645

Another way to look at the data is the number of matches I have per country.

Country Men Women
England 1289 75
India 1074 21
South Africa 676 45
Australia 268 303
New Zealand 298 185
Pakistan 441 22
Sri Lanka 332 103
West Indies 287 97
Bangladesh 205 35
United Arab Emirates 228 8
Zimbabwe 118 0
Other 140 76

The SQL database consists of a number of tables which contain different data. These include matches, innings, batting stats, bowling stats, fall of wickets data, score by over, partnerships, player vs player data tables, as well as tables with data on players, teams, each series, and venues. The batting stats table, in addition to telling you how many runs were scored and how many balls were faced for each batsman, also doubles as a list of which players were in each match, so there’s a row for players who didn’t bat.

I have data on every innings played in each match, and I have batting and bowling stats for all matches except those which were abandoned before they started. Likewise, the fall of wickets and partnerships data is reasonably complete, although in some cases data is missing on which players were in each partnership and were the ones to lose their wicket at a particular point.

The database also contains information on the score at the end of each over, but this only exists for 3923 matches. I also have data on the batting and bowling stats broken down by each combination of batsman and bowler, but only for about 3061 matches. These are still substantial samples, but it’s important to check to ensure you are using comparable data when using these in any analysis.

I also have a table with personal information on 7223 players, including their name, their nationality,their gender (derived from which games they have played in) as well as their batting and bowling style. For some players I also have data on their birthdate and height, although this isn’t currently in a usable format. This table includes 5728 male players and 1492 female players.

I also have data on each series played. A ‘series’ can be a major tournament like the Big Bash League or the World Twenty20, or just a tour or a single match played between two countries. Each series represents only one season’s worth of play, so the Big Bash League appears in the table four times. I have a field called ‘multi_year_series’ which can be used to group together tournaments which happen in multiple years. The table currently has data on 467 series.

I have a table with data on 414 teams. This data includes the gender of the team (the Australian men’s team and women’s team are treated separately), and in some cases I have specifically filled in the geographic name of the team (‘Adelaide’) and the nickname (‘Strikers’), and the home country for every team.

Finally, the table contains data on 348 venues. This includes the country that the venue is in. By joining this table to the matches table, you can analyse match data by host country.