SQL Test Scores Database
Russell Degnan

Apropos to David Barry generously offering his collection of statistical data, and having finally got to updating my parser for the changes to cricinfo's scorecards, I'll do the same.

Download the zip file available here: testscores.zip.

Instructions for use:

To begin

  1. Unzip testscores.sql and import into your database. I use mysql, running on XAMPP (which includes PHPMyAdmin).

That is sufficient to look at the data. It includes all test matches (bar the ICC XI travesty) up until 24th October 2010. If anyone wanted to rewrite the parse script for ODI/T20 games feel free to do so and I'll add it to the file.

The format is relatively straight forward but you'll need to work out what is what for yourself (and know some basic sql. I'll leave comments open on this post for questions).

  • team - numbered 1 to 10, in the order they first played test cricket.
  • game - a test match, use g_id as index reference, g_cricinfoid to reference the cricinfo scorecard.
  • innings - a team innings, referenced by i_id, references game with i_testid, team by i_teamid and bowling team by i_oppid
  • player - a player, uses the name first encountered, p_id for reference, p_cricinfoid for the cricinfo player id, none of the details are filled (TODO).
  • bat - a batsman's innings, references i_id via b_inningsid and p_id via b_bat_pid.
  • bowl - a bowler's innings, reference i_id via w_inningsid and p_id via w_bowl_pid.
  • extras - extras in an innings, references i_id via e_inningsid.
  • fow - the fall of wickets in an innings, including injured partnerships (marked as unbroken, with two partnerships having the same wicket number. FOW needs careful coding, naive queries will be slightly off because of not outs and retirements. References i_id via f_inningsid, b_id via f_open_bid (batsman in middle, or 1st in order for openers), f_dis_bid (batsman dismissed if any), f_no_bid (batsman not out).
  • close - score at close of play and batsmen/overs (if available, scorecards are incomplete). References i_id through c_bat_iid and batsmen b_id through c_bat1_bid and c_bat2_bid. Needs parsing of available game notes (g_notes) for luncheon/drinks intervals (TODO).
  • series - collated series of games. References game numbers via s_testid_list.

To update

  1. Download the scorecard from cricinfo and save somewhere. I have a batch downloader for people running their own webserver (again, XAMPP) called score.php that gets the 20 most recent scorecards; add ?page=X to the URL to get older cards. Copy them to a download directory.
  2. You need perl, I use Strawberry perl for windows, as apparently does Larry Wall, but suit yourself. You also need to download (via CPAN) the following packages:
    Text::CSV_XS;
    HTML::TreeBuilder;
    DBI;
  3. Strip the crud around the scorecard with the command:
    perl strip.pl < download/XCRICINFOID.html > clean/XCRICINFOID.html

  4. Run the parser to add to the database. Will delete all records for that gameid, but can cause unwanted records if it errors (which it might if cricinfo changes their format or for other reasons).
    perl players.pl clean/XCRICINFOID.html
  5. Update series data (and do any other post-processing):
    INSERT INTO Series ( s_season, s_home_tid, s_away_tid, s_length, s_date, s_home_win, s_away_win, s_drawn, s_tied ) SELECT g_season, g_home_tid, g_away_tid, max( g_seriesnum ), min( g_date ), sum( g_home_tid = g_result_tid ), sum( g_away_tid = g_result_tid ), sum( g_result_type like 'drawn%' ), sum( g_result_type like 'tied%' ) from Game where g_seriesid is null group by g_season, g_home_tid, g_away_tid order by g_testnum;
    UPDATE game,series set g_seriesid = s_id where s_home_tid = g_home_tid and s_away_tid = g_away_tid and s_season = g_season;
    UPDATE series set s_testid_list = (select group_concat(g_id) from game where g_seriesid = s_id group by g_seriesid);
    UPDATE game, innings set i_oppid = g_away_tid where g_id = i_testid and g_home_tid = i_teamid;
    UPDATE game, innings set i_oppid = g_home_tid where g_id = i_testid and g_away_tid = i_teamid;
    UPDATE bat,innings set b_testid = i_testid where b_inningsid = i_id;
    UPDATE bowl,innings set w_testid = i_testid where w_inningsid = i_id;
    UPDATE extras,innings set e_testid = i_testid where e_inningsid = i_id;
    UPDATE fow,bat set f_testid = b_testid, f_inningsid = b_inningsid where f_open_bid = b_id;

Any problems or suggestions drop a comment here.

Known Problems

  • Players who change names get the first instance of their name (notably MoYo), best manually edited.
  • Grounds are stored by name, not id.(TODO). This gets around the above problem however.

Cricket 24th October, 2010 12:23:18   [#] 

Comments

SQL Test Scores Database
Thanks a ton!
soulberry  24th October, 2010 19:41:59  

SQL Test Scores Database
You're welcome!
Russ  25th October, 2010 22:36:51  

SQL Test Scores Database
Hi,

I've been coming to ur site once in a while and have been finding ur work very interesting, specially the test ratings. I tried what has been explained above, but still its a bit confusing for me. Can you help me out with this if you have time? My mail id: vishall2402@gmail.com
vishal shah  21st March, 2011 07:51:51  

SQL Test Scores Database
Hi Vishal, happy to help, but it is probably easier (and more helpful for others) if you ask questions here and I'll answer them as best I can.
Russ  21st March, 2011 20:42:41  

SQL Test Scores Database
Do you have a downloadable copy of the script which we can use to parse and download cricinfo scorecards?
Sumit Gupta  2nd January, 2012 18:03:30  

SQL Test Scores Database
Sumit, the zip file contains the script. You can modify it as you need. I will update it further in the next few days.
Russ  3rd January, 2012 09:23:25  

SQL Test Scores Database
Fantastic and immense effort, I guess. I was starting to write one myself, but thought of searching for one in the net and stumbled upon this. Excellent database with almost perfect database design. Thank you very much.
Unni  27th January, 2012 01:36:46  



Add a comment
Title  (opt)
Name 
Email  (opt)
Website  (opt)