Notice that, like the ICC ratings, New Zealand, and particularly, the West Indies, are not far in front of Bangladesh, Ireland and Zimbabwe. This ought not to be a shock, the West Indies have not beaten a team above them in the ratings in over a year, and hve won just 3 games against those teams in the past 4 years (a worse record than Bangladesh). New Zealand was recently thrashed by Bangladesh, but were themselves beaten by Ireland and the Netherlands before that.
Having an extended qualification process favours the best team going through, but if the weaker "big" team lose all their other games, and the stronger "minnows" win against their weaker opponents, then qualification comes down to the result of a single game. Qualification remains unclear, even if the other 20 group games are a little uninteresting. By running monte carlo simulations of the group games, we can produce some qualification tables. Direct is the probability of having more wins than the 5th based team, indirect is the probability of qualifying on net-run-rate. This is split (calculating margins is more difficult) and therefore favours the minnows a little (given they are more likely to suffer a blow-out).
This is the more straight-forward of the two groups, with the two weakest associates, the largest gap from 4th to 5th, and smallest from 3rd to 4th. However, Zimbabwe's recent record is promising, and New Zealand's weak; the probability of the top four going through directly is just 53 percent, though that means it remains more likely than not.
By contrast, group B is more likely to be close. In just 23 percent of scenarios do the top 4 make it through directly. Both Ireland and Bangladesh are capable of getting above the West Indies, as the leading three teams are unlikely to drop points. The West Indies remain favourites to progress, but in only one third of scenarios can they do so directly.
Perhaps, therefore, the group stages will not be completely uninteresting. There are several games worth watching closely, and - for those wanting to gamble - it is more likely that the big-eight will not progress than all succeed.
Cricket - Analysis 4th February, 2011 08:09:05 [#] [2 comments]
"...was dismissed just before the close of play"
|Set Batsmen||Not Out||Dismissed||Total|
This is filtered by batsman well set (had scored 20 runs already) and more than 20 instances of being around at the close. As you'd expect it consists mostly of players with exemplary techniques and a willingness to come back tomorrow to score more.
At the other end of the table are some surprises:
|Set Batsmen||Not Out||Dismissed||Total|
A few of these players could be described as lazy or overly aggressive, but it is nevertheless a surprise to see greats such as Weekes, Lloyd and Miandad, or a batsman as consistent as Richardson, getting out 30 of the time they are present at the close of play. Perhaps more surprising is the presence of two current England players in Pietersen and Strauss, and Cook (31.6% off 19 innings). You can never rule out pure bad luck in this type of statistic, but "declared half an hour before the close of play" might be worth considering for the Australians this summer.
Turning to the benefits of night-watchmen, the percentage of dismissals overall climbs to 29.9% for all players not set (less than 20) dismissed before the close. That is, a batsman is 50% more likely to be dismissed before the close if they are new to the crease. The players being dismissed the most aren't too surprising, though perhaps their level of vulnerability is:
|Unset Batsmen||Not Out||Dismissed||Total|
Shane Warne was perhaps not the best choice as a night-watchman. Other notably high players, batsmen, include openers Greenidge (33.3% off 27 innings) and Gooch (30.3% off 33), and candidates for a night-watchman: Thorpe (30.8% off 26), Ponting (30.4% off 23) and Lara (27.3% off 22). At the reliable end:
|Unset Batsmen||Not Out||Dismissed||Total|
Thus saving me the ire of the Indian blogging mafia. Surprisingly, Clive Lloyd almost made this table at 10.0%. Note the high position of potential night-watchmen Anderson and Russell.
Finally, there is a perception that Michael Clarke ought to be in these lists on the negative side. Perception is easily influenced by specific events. In this case, Clakre is no worse than most, being dismissed 4 times in 23 (or 17.4%) of the time when set, and 3 in 11 when not (27.3%). What mattered, is that those four times came in important games, twice in the 2005 Ashes at crucial times in games that were lost; again crucially, in a loss to India in Mohali in 2008; and once in 2009 in the draw in Cardiff. Like scoring runs, sometimes context counts.
Cricket - Analysis 21st November, 2010 14:18:29 [#] [4 comments]
As the reality of an Indian defeat to South Africa became apparent, Ducking Beamers posed an interesting question on the nature of India's stay at number one on the official rankings. Given the official rankings are supposed to be transparent and simple, this should be a relatively easy question to answer. Unfortunately, the official rankings are neither transparent, nor simple. The formula is simple enough, but when assessing its merits as a predictor, something is lost.
Firstly, ignoring wikipedia's series points (which don't affect the maths) you'll note that the ranking varies depending on how closely matched a team is. There is a reason for this, which I will get to later, but let's first note the formula for a standard rating:
series_result * (rating_opp + 50) + series_result_opp * (rating_opp - 50 )
This can be simplified, greatly, as follows:
series_result * (rating_opp + 50) + (series_length - series_result) * (rating_opp - 50 )
= series_result * rating_opp + series_result * 50 + series_length * rating_opp - series_result * rating_opp - series_length * 50 + series_result * 50
= series_result * 100 + series_length * (rating_opp - 50 )
In other words, the rating is made of two parts. The result multiplied by 100, which holds true regardless of opposition (it is included in the the alternative methods as well) and a rating adjustment for opposition that takes no account of the result. Strange choice. I won't say this doesn't work, but it strikes me as odd.
How then, did India manage to get to number one. Well, oddly enough, on merit:
Aus Eng Ind Pak NZ SAf Sri WI Ban
win pt 2550 1925 2300 700 1000 2300 1950 800 375
opp pt 1725 2093.5 1920.5 1087.5 1071 1817 1270 757 0
str pt 326 60 130.5 44 -22.5 106.5 81 0 0
weak pt 0 0 0 0 213 0 0 677 -166.5
games 39 39 35 22.5 28.5 34.5 29 29 21.5
avg win 65.38 49.36 65.71 31.11 35.09 66.67 67.24 27.59 17.44
avg opp 52.59 55.22 58.6 50.29 44.26 55.75 46.59 48.93 -7.74
rating 117.97 104.58 124.31 81.4 79.35 122.42 113.83 76.52 9.7
The avg win and avg opp are the key fields here. Note that India have almost the highest avg win (which, broadly speaking is just a percentage of games won) and the highest opposition value. Their opponents have actually been harder than any other team's. (Note also, that the ratings above are a little approximate, due to rounding and other calculation difficulties).
Should we then all acknowledge India as (at least for the moment) the undisputed test number one? Possibly not. Because this rating system is a long way from being infallible.
Let's start at the bottom. Bangladesh achieve an impossibly low rating, given there is an automatic 50 points (on average) for playing someone else. This is because they don't get this rating, because to do so, breaks other things. If Bangladesh was rated in the normal way, their rating would be close to 50 (practically no points for winning, but an opposition rating of 50). If a team played Bangladesh, then their maximum points from that contest would be 100 + Bangladesh's rating - 50, or about 100. In other words, playing rubbish sides hurts your ranking, because the 50 point calculation artificially limits it (the same applies to New Zealand, Pakistan and the West Indies now - the most you can get is 130 points).
To get around this the rating system does something odd - in a mismatch, it ignores the rating, and gives a team the points for winning, plus your rating minus 90 (or your rating above 100 plus 10, since the rating system is centred - sort of). But Bangladesh, being rubbish, get the win points plus their rating MINUS 10. Which is a negative number.
And in case you don't think this is a great injustice, note this: if Bangladesh were to perform as a below average side, winning 1 in 3 games (or roughly the same as the three teams above them) for three whole years (the entire measured period of the ratings), their ranking would be about 40, whereas those other teams would still be ranked about 80. That is not right.
For teams playing against inferior opposition, it is possible to endlessly increase your rating, provided that you maintain a win percentage of 90%. That number is assumed by the rating system, regardless of the quality of the lowly-ranked opposition. Thus if your ranking is high, it is much better to play Bangladesh than the West Indies, New Zealand or Pakistan, against whom a 90% win percentage is actually difficult.
Perhaps fewer readers will care about the much smaller injustice faced by Australia, but note that it might soon happen to India, and worry. When Australia had a ranking of 140, their opposition took 90 points per game played against them, regardless of result, while Australia got the opposition rating (around 50). In general, a team should garner as many points from each series as their ranking would expect, and so, while Australia remained a 140 team, their ranking remained at 140. Thus, when playing India, a 110 team, Australia would get 60 points from playing India, and 80 (on average) from wins.
But when Australia became equal with India (more or less), their points are redistributed, raising India up to a 120 team and lowering Australia down. Australia's points from winning drop, to just 50, and India's increase, up to 50 (from 30). But, in that immediate period the points from playing the opposition do not change, Australia continues to get 60 points for playing India, and India 90 for playing Australia. The rating change over-shoots a little.
Now, this should not matter, because, the rankings would balance out after a few series as different teams compete against each other. However, these ratings have a cut-off. Every August the ratings are rolled over, the fourth year is discarded, the second given half its value. What happens then, as happened last August, is that the parts of the average maintaining Australia's high rating, in spite of the over-shoot, are discarded, and the average drops far below what it should be. Next August, Australia's 5-0 Ashes triumph will disappear (average points: 160), and they'll probably drop to fourth again (or worse).
The oddest aspect of this though, is not that the cut-off has strange effects, but that a cut-off is entirely unnecessary. The ratings are balanced against each other; if a better rated team does worse than expected their rating will fall. Results from three years ago already have very little bearing on the rating, because more recent results pull a rating into place like a pendulum. A weighting for the new ranking, based on the number of games played in recent years is both sufficient and better.
Apart from being completely unreliable for teams which never win, or teams that always win, or teams that have had a recent change in their rating, or have done so in the past four years, the ICC ratings are moderately accurate measure of a team's performance. This shouldn't be surprising, however. Of the dozen or so rating systems in existence, all of them are pretty good at predicting the easy things. Deciding who is the best team out of India, Australia and South Africa however. That is not possible, and any rating system putting more than a couple of percentage points between them (as the ICC one does, incidentally) is wrong.
What astounds me about the ICC system though, is that in trying to be simple, it is actually complicated, and yet, despite that simplicity, it is in many ways, mathematically unsound. It works, in spite of itself. Which is an odd thing.
Cricket - Analysis 11th February, 2010 20:01:36 [#] [0 comments]
Following on from my comments on Australia's propensity to collapse in the last ratings, TonyT notes that it has become so obvious that even the selectors have pointed it out. Following Tony's lead, I'll also restate what I said a year ago:
"Ponting is proud, so surrendering the number three slot is against his nature, but the number of collapses in the past year has been alarming."
Why are others so slow on the uptake? Why is it only now that people are noticing that no less that with the exception of the openers, the top six are unreliable, with Haddin, Clarke and North, in particular prone to making runs only off the sturdiest of platforms. Partly, it is because averages hide those fallibilities. All of the batsmen average around 40 to 50 over the past two years, which is reasonable enough. The sides average total is solid enough and no worse than most others. The problem lies in the distribution.
The graph above shows the score distribution for each of the nine test teams . A comparison with England is most pertinent. England have what you'd expect a side to have: a roughly normal distribution of scores, centred around 330, with most of their scores roughly 150 runs on either side of this. Australia however, while at or near the top in scoring between 400 and 700, are near the bottom for scores between 250 and 350, and in the midst of the cellar dwellers for scores below 200.
The pronounced double peak indicates a batting lineup incapable of playing sensible innings in poor conditions - a problem shared by New Zealand and Pakistan. The top peak produces batting averages that hide a soft middle order. One that will turn a poor session into a disastrous one, and is subsequently incapable of winning series against good sides who don't have those bad sessions, and who can merely wait for the opportunity they'll invariably be presented with.
Australia's batting, in other words, is in serious trouble.
Update: Much of the above is wrong, or partly wrong, because of the distribution of very large scores. The graph below shows things a little better, and it isn't any prettier for Australia. 
Note that Australia have the worst record of any major (top 5) side for making between 170 and 290 (the collapse problem), but are getting to 400 at roughly the same rate as everyone (and well in front of England who don't make a lot of scores over 300). What is more noticeable in this graph is that Australia is failing to make very large scores (500-600) at the same rate as the other sides in the top-4. This is a consequence of having a lineup seemingly incapable of very large centuries, but it makes it doubly hard to rescue a game from a sub-300 first innings total.
 Each score has been converted to a normal distribution centred on that score, summed, then expressed as a percentage. Uncompleted innings have been projected forward.
 This shows a cumulative distribution, essentially the percentage (x8) of times a side is bowled out for less than that score.
Cricket - Analysis 11th January, 2010 23:12:39 [#] [2 comments]
Writing in to cricinfo, Tim Parsons raises the oft repeated point that England were unfairly disadvantaged in the World T20 Cup by the application of the Duckworth Lewis system. The circumstances of that game are no longer terribly important, but the likelihood of further D/L problems remains.
Unfortunately, neither Tim, nor the commentors make a mathematical claim for why D/L is failing, except in so far as Tim claims - without evidence - that the small deduction in power play overs has advantaged the chasing team, and that there is limited data for T20 matches. This is a pity, because neither reason is correct, as far as I can tell, but there is a mathematical problem with the application of D/L in very short games.
The way D/L appears to be constructed (caveats for differences between standard and the proprietary professional edition), is to create a symmetrical target. That is, if a side is chasing, there is a certain point they need to be at, given their available overs and wickets, to win if the game is abandoned. the symmetry comes about because it is assumed, that if the batting side loses overs, that they lose resources in proportion to the overs lost. This produces a line of expectation, for the chasing side: the dark blue line in the graph below.
A short but relevant statistical aside
It is assumed, and even commented on at the link above, that the longer the period a team has to score at the set run-rate, the more difficult it is for them to achieve that. Which is true, empirically speaking, but not statistically.
Consider a situation where a team has unlimited batsmen and has to score at 8 an over to win. If the mean runs scored per over in this situation is greater than 8 then the more overs the team has, the better off they will be, because, probabilistically, the more overs there are, the more likely their run-rate will approach the mean (and the mean is above the required rate).
But, teams do not have unlimited batsmen. Even in a T20 game, their approach must be tempered against the need to preserve wickets. More importantly, even in a T20 game, the bowling side must take sufficient wickets to get batsmen to the crease for whom the mean scoring rate is below the required rate (tail-enders, in other words).
In which the problem with D/L becomes clear (ish)
Wicket-taking, therefore, becomes the imperative of the bowling side. And therein lies the problem with the resource reduction methods of D/L. Because the D/L is symmetrical, the reduction of several overs at the beginning of the innings assumes that the bowling side has had the worst possible start to the innings: no wickets.
Under the standard system, which projected that teams would chase the majority of the runs in a finishing burst, the chasing team was advantaged because the target for most of the game was below what they might reasonably need to be at. Under the professional system , specifically for higher run-rates, it was recognised that teams must chase a total almost linearly (or, to put it another way, the closer a required rate gets to the mean maximum scoring rate, the less conservative a team can afford to be). But an almost linear reduction for lost overs at the beginning of an innings results in a batting side getting their total reduced without any reduction in their ability to hit out.
Intuitively, this is recognised by those who argue a batting side should have their wicket allocation reduced in this situation, but there is a simpler method. Read backwards, the D/L resource table tells you the number of runs lost, with a reduction in the wicket resource (normally around 1-4). The simplest solution therefore, is to project how many wickets a bowling side might reasonably have taken in the lost overs, and add that number of runs to the required total. The effect on the curve can be seen in the light blue line, below.
For England, who might reasonably have expected to take 3-4 wickets in 11 overs of a T20 against the West Indies, the target would be adjusted from 81 to 87 or 88. Or roughly where most observers seem to think the target should have been. In a 50 over game, where wickets are less frequent, this sort of change would be less noticeable, which is, no doubt, why it has not been more obvious that over reductions favour the batting side slightly.
Cricket - Analysis 24th October, 2009 23:54:35 [#] [0 comments]
In a lovely testimonial to the batting exploits of Chris Martin, Old Batsman noted the high percentage of total runs Martin's highest score took up. Somewhat curious, I thought I'd have a look at it.
The biggest problem, obviously, is that it gets progressively harder to make your highest score dominated your total runs scored, and any straight query returns nothing but one test wonders who scored all their runs in their only knock. Martin, in that sense isn't close.
As a more realistic measure, we can multiply the percentage by the number of innings, to rate longevity higher. This is equivalent, however, to dividing the high score by the average (ignoring not outs). Martin hasn't played his career out yet, but even so, he loses out on percentage and on the index to the undisputed king of innings out of the blue: Jason Gillespie.
One shouldn't actually read anything into this list, because the number is mostly meaningless. But the size of Martin's lead over acknowledged bunnies in Chandrasekhar and Walsh is impressive.
Cricket - Analysis 19th March, 2009 22:02:15 [#] [0 comments]
As part of a continuing series on why Australia should drop Matthew Hayden, it is worth considering the claim that his struggles represent a loss of "form", instead of something intrinsic to his game. And how is it that one could tell, statistically, rather than merely watching his increasingly frantic high-risk shot-making, and inevitable dismissal.
A moving average
The first, and most obvious way is by looking at their average. Because Hayden has played so long, and because players must be judged on recent performances, the most obvious way is to choose some time period: the past year, the past two years, or some arbitrary date that makes them look especially bad.
The less obvious, but more objective way is to take a weighted average, where recent innings carry more weight than those played in the past. The weighting should probably be done by time (Ri*0.95^M where Ri = runs scored in innings i and M = months since innings) but for technical reasons it is easiest for me to do it by innings (Ri*0.95^I where I = subsequent innings played).
Ignoring not outs. Hayden's weighted average is now down to 35.47, having been as high as 53.12 at the end of last summer. A rapid decline, but only 3 1/2 runs worse than Hussey whose decline has been even more marked.
But there is another aspect to form, and that is the perception of luck. All batsmen have bad periods, because all batsmen are vulnerable to getting out for less than 20. Greg Chappell famously remarked, in the midst of a serious run of low scores, that he didn't know if he was out of form, since he hadn't batted long enough to know.
If a batsman averages 30, there may be two reasons. Firstly, it may be because they are not that good: that the distribution of their innings is that of a typical player of that average. And secondly that they are out of form: that a rash of unexpectedly low scores is not sufficiently off-set by hundreds and fifties that they typically make, and which would keep their average much higher.
Because scores are distributed unevenly, we can calculate consistency of performance by taking a weighted log average:
(2^(∑(all innings i)[log2(Ri*0.95^I)])/Wn)
where Wn = weighted number of innings played)
If a player scored the same in each innings, their log average equals their normal average. Normally, this average is around half the normal average, thus, by dividing them, we get a consistency ratio. A normal ratio seems to be around 0.5. Players who score big hundreds amidst lower scores will have lower ratios (0.3-0.4), players scoring consistently, higher ratios (0.6-0.7).
But consistency, over short time periods, is also a measure of form. A player whose recent average is mostly the result of one big innings is obviously capable of large innings, but lacking in form. A batsman whose recent average reflects consistency of performance is likely to reproduce that form.
Combining the measures
Players' form and ability can therefore be judged against the two measures. Hussey, who last year had a recent average of 71.9, and a form ratio of 72, was clearly in rare form, that would, inevitably, end. Now, remarkably, he has regressed to the opposite end of the spectrum, averaging 39.15, with a form ratio of 39. Looking at how the form ratio tracks Hussey's recent average. Once his form regresses to the 0.5 mark, he is likely to have a career average of around 48-50. Respectable, and worth persisting with. But what of the others?
The graph is a little hard to make out, but (ignoring Haddin, who is still finding his feet), Australia's batsmen fall into three groups.
Hussey and Ponting, whose form a year ago was stellar, but who have recently struggled, both dipping into the low 40s, before Ponting's recent recovery.
Katich and Clarke, whose averages have been high this year - Katich particularly, though he is still being penalised from being dropped. But, whose form is so consistent, they are due to fail. More particularly, while their recent averages are respectable, they aren't high for players in such good form, and need to make better use of their run scoring opportunities. Tracking back to "typical" form, their averages would decline to just 40.
Symonds and Hayden, whose averages were both in the high 40s, low 50s last year, but who have recently declined to the mid 30s. More pertinently however, their form has not dipped with their average. This indicates that their recent middling scores are what you'd expect for players of their current abilities.
While Hayden's decline is of recent vintage (a year ago, you'd expect him to average in the low-40s), it is materially different to Hussey's in that it isn't luck deserting him, but his ability to make decent scores. His starts are marked by failures every bit as bad as those where he doesn't start at all. And for an ageing player, that matters.
Cricket - Analysis 1st January, 2009 23:17:35 [#] [4 comments]
There was a point in the not-too-distant memory, when enforcing the follow-on was done as a matter of course. In Australia, if not elsewhere, those days have finished. If Headingley 1981 was a freak occurance, never to be repeated, several other tests served notice that enforcing the follow-on was not always desirable.
After looking at Australia's recent record in enforcing, or not enforcing the follow-on, I am prepared to make a much stronger statement: unless there is insufficient time remaining to force a result, enforcing the follow-on is never worthwhile.
"Insufficient time" is an interesting problem. If rain is expected, and in England, one may reasonably always expect it, then enforcement may be necessary no matter what day it is. In other circumstances, it is worth considering what has happened when Australia has or hasn't enforced the follow-on.
As Gideon Haigh showed, the catalyst for a change in thinking came first from Mark Taylor, having spent 152 overs in the Rawalpindi heat only two games into his captaincy, he declined enforcement in the first Ashes test in 1994-95. The former was drawn, but such a result is not common - the loss to India in Calcutta in 2001 was the only other non-victory - and over-coming the 200+ run deficit of a follow-on is near impossible. Yet, despite the problem of having to set a target, non-enforcement has not resulted in a single loss or draw in the eight instances since 1993-94.
The record, therefore, stands like this:
Non enforcement Wins: 8 Draws: 0 Losses: 0
Enforcement Wins: 9 Draws: 1 Losses: 1
Four of the nine victories when enforcing the follow-on were by an innings. The others victories had mostly small chases (the largest being 107), yet the prospect of a chase in the fourth innings looms in over 50 percent of instances. Even if Australia didn't have a long standing propensity to collapse chasing small targets, giving the opposition that chance can be dangerous. While much is said about the value of winning the toss and batting, it is worth remembering that it is the fourth innings, not the second (which is arguably better than the first), that presents the greatest difficulties for batsmen. Not enforcing the follow-on gives you the best of the batting and bowling conditions.
How much better?
Well, the average total for teams following on is 337 off 109 overs. The average for teams batting fourth is 235 off 81 overs. Because pitches vary, it is worth normalising those figures to the opposition's first innings. On average, teams following on score 104 runs more than they did in their first innings, off an extra 31 overs; teams batting fourth, on average, score just 26 runs more, off 2 less overs.
There is a lot of variation in these averages, teams have been rolled cheaply while following on, but more often than not they bat better than previously, and quite often that is much much better - in 7 of those 11 instances, the following on team batted for more than 110 overs, in the other 8, just once.
This puts the lie to the assertion of Mike Selvey and others, that Ponting somehow ceded England an advantage by not making them bat again. While England did well in their second dig, this was an anomalously good performance for a team batting fourth. Nor is it possible to predict how England would have performed had they been asked to follow-on. But one thing is clear from past history. If there are more than 4 1/2 sessions to play, enforcing the follow-on is a mugs game. On average, it results in your bowlers bowling for longer, when they are still tired from the first innings; it means the possibility of a having to chase runs on the final day is more than likely; and, if the large variations in scores made are any indication, allows control of the game to slip away,
There has been an unhealthy focus on psychology in the lead-up to the Ashes, and in the media coverage following the first test. Cricket may be a mentally challenging game, but ultimately, any advantage or otherwise doesn't exist until the scoreboard ticks over. There is a lot of cricket to go yet.
Cricket - Analysis 1st December, 2006 02:00:14 [#] [1 comment]
In the past few years statistics have got a real jump in cricket. While we lag behind baseball by some margin, coverages are now full of interesting (albeit sometimes pointless) graphics, and articles abound on this or that statistical artifact. Unfortunately, some of them are complete crap.
The Numbers Game is like that. I like to read it a lot, it often produces interesting numbers, like the contribution by the last 5 wickets or the series on individual strokes (although all of them have their flaws). But it has an annoying tendency to produce figures that are clearly no more than luck, as some sort of keen insight into the game.
I mentioned above that baseball is well beyond cricket in terms of statistics, and there is a simple reason for it. They understand error, and distributions, and the all important difference between transient and persistent phenomena. You never see an error estimation in the Numbers Game. If there was he would soon realize that cricketers play so few games, and have such variable results that seemingly sensible statistics like "the difference between performances on the sub-continent and away" are only valid for a few players with dozens of games on each. He clearly has no sense of distribution, for reasons we'll see shortly. And the consistency with which figures are presented that were the result of a few lucky innings, rather than a season-by-season result is phenomenal.
Baseball understands this. And still they make mistakes. But to see what I mean, I highly recommend this article by Bill James. It lays out some of the problems in great detail, concluding in part, that many things can't be measured -- either there or not there -- because the errors in baseball are too large.
Cricket, if anything is more so. Take this table of differences between the performances of left and right hand batsmen against different sides. It looks plausible but the figures won't show it. The measure that should be looked at here, is not, the difference between the two, but whether the performances match what you would expect. When you look at those numbers, all sides are well within 4 runs or 10% of expectation (based on projecting the linear correlation with r-squared: 0.8968). Yet the averages themselves are so variable (it only takes a double-hundred or a half-dozen cheap wickets to shift it a run or so), that any difference is probably just luck.
The difference between first and second innings performances is a little different. The correlation is much lower, so there is more going on than just standard variation. But here, the time-frame (since 1990) is so long, that you can't say anything useful about current sides, or even a side from the mid-90s. What should be plotted, is the difference between the expected second innings averages, and the actual second innings averages, for each side, for each year. Then we would either see a trend, or a lot of natural variation, or somethign in between.
But neither of those has anything on this week's effort. The use of standard deviation in this case is hopelessly misguided. For many reasons. The distribution of a batsman's scores is heavily skewed, with over a third (as a rule) less than 20. Standard deviation is not only more likely to measure the ability of a batsman to score big centuries (ie. Lara, Attapatu, Zaheer Abbas, Bradman), but more likely to be low when the median is near the mean (ie. when the average is low, as for Pollock, Marsh, and Hadlee).
But he didn't just use standard deviation, he created an index of average/st_dev. But look at what that is:
( runs / (innings - notouts))
sqrt( ( sum ( diff_means ^ 2 )) / innings )
Which means several things:
- Not outs provide an arbitrary cap on the potential runs, and therefore affect the average less than the standard deviation. This is why there is an arbitrary 5000 run limit. Without that, you get Pollock (1.33), Brett Lee (1.21) and other useful lower end batsmen in the top 10.
- A high score affects an average much less than it affects the standard deviation (try it and see). Players without big knocks (Mark Waugh, Chanderpaul, Ranatunga) do better.
- The number of innings increases your index by the square root of that number. Hence a player with the same score distribution, but quadruple the number of knocks will have double the index.
It is an interesting figure, and consistency is in the mix, but so are lots of other factors you don't want, and do nothing but skew the figures.
There are at least two better ways to measure consistency. One is the median, that will give the central knock, and is a reasonable way of telling how often a player gets a start. Another would be to remove the innings bias by dividing the current index by the square root of the number of innings (unsuprisingly, Bradman dominates given this measure).
Regardless, a more than cursory examination of the statistics being produced would also help. Just because it is constructed to say something doesn't mean it necessarily does.
Cricket - Analysis 29th April, 2006 19:09:44 [#] [3 comments]
Australia's bowling stocks are not unusually low, but the turnover has been. Players who would normally have been picked years ago have been kept out by the solidity of Warne, McGrath, Gillespie, Lee, MacGill and Kasprowicz. This will change sooner rather than later, and probably in the next two years. Here is how I see it. The good news for Australia is that there do seem to be some good players available in the near and long term. The question is how to manage the transition.
Warne remains a genuine competitor, and I hope he stays till he's 40, but unfortunately for everyone except opposition batsmen, he also might not. Retirement looms ever closer.
McGrath is probably gone. Even when he comes back, if he comes back, it won't be for long. The Ashes at most. Then to replace the irreplacable.
Lee is the current spearhead, but not so long ago he was dropped for form. Form, that since then has had one good series (away to South Africa) against quality (with due respect to the Windies) opposition (albeit with 12 of 17 wickets against tail-enders). The good news is his economy rate is dropping, but he needs to stay on it. An average in the low 30s won't be good enough.
Kasprowicz showed nothing of particular worth against South Africa. In a young bowler you might persist, but his age (34) tells against him, and he should be dropped sooner rather than later.
Clark had the best debut by a bowler in a long time, and looked pretty solid doing it. He is almost 31 though and a first-class average near 30 makes him, at best, a two-year stop gap, and at worst, a one series wonder.
MacGill is another on the wrong side of 35, can't field and can't bat. He might be tempted to play till Warne retires, but a successor he isn't. Australia needs another spinner, and getting someone in to learn from the current master should be the strategy.
The pace prospects:
Bracken was unlucky to get dropped, given his recent record and first-class form. But the selectors picked Clark right, even if Kaspa was wrong. On the right side of 30, he should be playing.
Gillespie for me, unlike most commentators, is not finished. Not quite 31, he has at least three seasons in him, and remains the best line-and-length bowler Australia has after McGrath. It will be the Australian teams' loss if he spends his last days knocking over Shield sides for not many.
Tait has injured himself again, but frankly doesn't need to be playing Tests for a couple of years if we can avoid it. As Lee is finally learning, accuracy counts at the top level and Tait needs some before he plays again.
Johnson is another young player with big raps on him. Performed well in the Shield final but averaged 30 over the season. Like Tait, remains more of a prospect than an option.
Hopes has the advantage of being an all-rounder and the disadvantage of noone being sure of what. Needs to do one thing really well before he'll get a Test look-in.
Watson is a better prospect than Hopes. Still young, but unfortunately more of a batsman than a bowler. A reasonable bet if Australia plays two spinners, but not otherwise.
Dorey seems to have an alright record given his limited experience. His one-dayer experience was a let-down, but he will probably be back.
Griffith isn't mentioned much, but was the highest wicket-taker in the Shield this season. Seven five-fors in 30 matches speaks well of him, although, like Dorey, he'll be 30 in a couple of seasons.
And the spin prospects:
Cullen is very young and a spinner. Both rare commodities. Picked for Bangladesh, but struggled this season. Probably a decade from his best cricket.
White, as the only young leg-spinner around, must consider himself a chance, but like Watson, his batting is stronger than his bowling. Might come on, but lacks the unnatural spin or drift a top-class spinner would have.
With Symonds or Clarke in the side Australia has no need for a second spinner except on real turners (if Watson or Hopes were picked then this would depend on the venue). With that in mind, Warne is a certainty for as long as he plays, with MacGill as no more than backup. For those odd tests though, I'm inclined to go with Cullen, to give him a chance in favourable conditions, and because in three or four years he is likely to be our only real spin option.
The pace bowling lineup needs balance above all -- something lacking in the last Ashes debacle. If three bowlers are to be picked, then two need to be capable of bowling tight lines, to support Warne, each other and the more attacking third option. For the next two years, this means picking two of McGrath, Gillespie, Clark, and Bracken. For preference, one of the first two, and one of the second; although Gillespie is closer in age to the others I fear injury has taken a greater toll on his body. If three of them are unavailable then Johnson, Dorey or Griffith.
The attacking comes from Lee, who like the other hs probably four years in him at most. If form or injury intervenes then Johnson, then Tait. Lee is so smooth, injuries are rare, but Tait seems to be plagued by them, which means the likely long-term option is Johnson.
Form and injuries are variable, class remains unjudged, but some rough predictions can be made for the next two series that count. Likely substitutes for injury/form/retirement in (parenthesese), less likely backups in [brackets].
Too near now to predict the team will change much. Based on the South African performance, probably a reasonable lineup. More than anything there are players who can restrict and frustrate England's free scorers.
McGrath (Gillespie), Lee, Clark (Bracken), Warne [Cullen, Johnson]
Far enough away that its hard to predict, but not so far that names noone has heard of yet will emerge, unless they are genuine superstars. McGrath will be gone for sure, Gillespie and Clark almost certainly as well. Warne we can hope, but it might be in vain; perhaps he'll want 1000 wickets. Lee will be 33 by then as well, but should be there. I mark Tait down because I fear he'll never be accurate enough to survive at test level, but unless Dorey or Watson improve markedly he is the most likely successor to Lee. Note, however, the possibility that we'll have three left-armers, and none of last year's team.
Lee (Tait), Johnson (Watson), Bracken (Dorey), Warne (Cullen) [Griffith, White]
Update: I wrote this before the team to play Bangladesh was selected, and therefore before Australia's decision to pick five bowlers. I think this policy is a grave mistake. Bowlers may win games, but batsmen lose them. The advantage of having a fifth bolwer doesn't come near offsetting the extra runs of a sixth bat. And not just in raw averages. Batsmen rarely score their average, it is a skewed distribution with a few scores at the high end and lots of low ones. The more of them you have, the closer the team will get to its expected value -- ie. the more stable your totals will be and the less likely a collapse for not many will put you out of the game.
Cricket - Analysis 9th April, 2006 11:04:50 [#] [2 comments]