Useless statistical indicator alert Russell Degnan

In the past few years statistics have got a real jump in cricket. While we lag behind baseball by some margin, coverages are now full of interesting (albeit sometimes pointless) graphics, and articles abound on this or that statistical artifact. Unfortunately, some of them are complete crap.

The Numbers Game is like that. I like to read it a lot, it often produces interesting numbers, like the contribution by the last 5 wickets or the series on individual strokes (although all of them have their flaws). But it has an annoying tendency to produce figures that are clearly no more than luck, as some sort of keen insight into the game.

I mentioned above that baseball is well beyond cricket in terms of statistics, and there is a simple reason for it. They understand error, and distributions, and the all important difference between transient and persistent phenomena. You never see an error estimation in the Numbers Game. If there was he would soon realize that cricketers play so few games, and have such variable results that seemingly sensible statistics like "the difference between performances on the sub-continent and away" are only valid for a few players with dozens of games on each. He clearly has no sense of distribution, for reasons we'll see shortly. And the consistency with which figures are presented that were the result of a few lucky innings, rather than a season-by-season result is phenomenal.

Baseball understands this. And still they make mistakes. But to see what I mean, I highly recommend this article by Bill James. It lays out some of the problems in great detail, concluding in part, that many things can't be measured -- either there or not there -- because the errors in baseball are too large.

Cricket, if anything is more so. Take this table of differences between the performances of left and right hand batsmen against different sides. It looks plausible but the figures won't show it. The measure that should be looked at here, is not, the difference between the two, but whether the performances match what you would expect. When you look at those numbers, all sides are well within 4 runs or 10% of expectation (based on projecting the linear correlation with r-squared: 0.8968). Yet the averages themselves are so variable (it only takes a double-hundred or a half-dozen cheap wickets to shift it a run or so), that any difference is probably just luck.

The difference between first and second innings performances is a little different. The correlation is much lower, so there is more going on than just standard variation. But here, the time-frame (since 1990) is so long, that you can't say anything useful about current sides, or even a side from the mid-90s. What should be plotted, is the difference between the expected second innings averages, and the actual second innings averages, for each side, for each year. Then we would either see a trend, or a lot of natural variation, or somethign in between.

But neither of those has anything on this week's effort. The use of standard deviation in this case is hopelessly misguided. For many reasons. The distribution of a batsman's scores is heavily skewed, with over a third (as a rule) less than 20. Standard deviation is not only more likely to measure the ability of a batsman to score big centuries (ie. Lara, Attapatu, Zaheer Abbas, Bradman), but more likely to be low when the median is near the mean (ie. when the average is low, as for Pollock, Marsh, and Hadlee).

But he didn't just use standard deviation, he created an index of average/st_dev. But look at what that is:

( runs / (innings - notouts))
_______________________________

sqrt( ( sum ( diff_means ^ 2 )) / innings )

Which means several things:
- Not outs provide an arbitrary cap on the potential runs, and therefore affect the average less than the standard deviation. This is why there is an arbitrary 5000 run limit. Without that, you get Pollock (1.33), Brett Lee (1.21) and other useful lower end batsmen in the top 10.
- A high score affects an average much less than it affects the standard deviation (try it and see). Players without big knocks (Mark Waugh, Chanderpaul, Ranatunga) do better.
- The number of innings increases your index by the square root of that number. Hence a player with the same score distribution, but quadruple the number of knocks will have double the index.

It is an interesting figure, and consistency is in the mix, but so are lots of other factors you don't want, and do nothing but skew the figures.

There are at least two better ways to measure consistency. One is the median, that will give the central knock, and is a reasonable way of telling how often a player gets a start. Another would be to remove the innings bias by dividing the current index by the square root of the number of innings (unsuprisingly, Bradman dominates given this measure).

Regardless, a more than cursory examination of the statistics being produced would also help. Just because it is constructed to say something doesn't mean it necessarily does.

Cricket - Analysis 29th April, 2006 19:09:44   [#] [3 comments]

The Future of Australia`s Attack Russell Degnan

Australia's bowling stocks are not unusually low, but the turnover has been. Players who would normally have been picked years ago have been kept out by the solidity of Warne, McGrath, Gillespie, Lee, MacGill and Kasprowicz. This will change sooner rather than later, and probably in the next two years. Here is how I see it. The good news for Australia is that there do seem to be some good players available in the near and long term. The question is how to manage the transition.

The incumbents:

Warne remains a genuine competitor, and I hope he stays till he's 40, but unfortunately for everyone except opposition batsmen, he also might not. Retirement looms ever closer.

McGrath is probably gone. Even when he comes back, if he comes back, it won't be for long. The Ashes at most. Then to replace the irreplacable.

Lee is the current spearhead, but not so long ago he was dropped for form. Form, that since then has had one good series (away to South Africa) against quality (with due respect to the Windies) opposition (albeit with 12 of 17 wickets against tail-enders). The good news is his economy rate is dropping, but he needs to stay on it. An average in the low 30s won't be good enough.

Kasprowicz showed nothing of particular worth against South Africa. In a young bowler you might persist, but his age (34) tells against him, and he should be dropped sooner rather than later.

Clark had the best debut by a bowler in a long time, and looked pretty solid doing it. He is almost 31 though and a first-class average near 30 makes him, at best, a two-year stop gap, and at worst, a one series wonder.

MacGill is another on the wrong side of 35, can't field and can't bat. He might be tempted to play till Warne retires, but a successor he isn't. Australia needs another spinner, and getting someone in to learn from the current master should be the strategy.

The pace prospects:

Bracken was unlucky to get dropped, given his recent record and first-class form. But the selectors picked Clark right, even if Kaspa was wrong. On the right side of 30, he should be playing.

Gillespie for me, unlike most commentators, is not finished. Not quite 31, he has at least three seasons in him, and remains the best line-and-length bowler Australia has after McGrath. It will be the Australian teams' loss if he spends his last days knocking over Shield sides for not many.

Tait has injured himself again, but frankly doesn't need to be playing Tests for a couple of years if we can avoid it. As Lee is finally learning, accuracy counts at the top level and Tait needs some before he plays again.

Johnson is another young player with big raps on him. Performed well in the Shield final but averaged 30 over the season. Like Tait, remains more of a prospect than an option.

Hopes has the advantage of being an all-rounder and the disadvantage of noone being sure of what. Needs to do one thing really well before he'll get a Test look-in.

Watson is a better prospect than Hopes. Still young, but unfortunately more of a batsman than a bowler. A reasonable bet if Australia plays two spinners, but not otherwise.

Dorey seems to have an alright record given his limited experience. His one-dayer experience was a let-down, but he will probably be back.

Griffith isn't mentioned much, but was the highest wicket-taker in the Shield this season. Seven five-fors in 30 matches speaks well of him, although, like Dorey, he'll be 30 in a couple of seasons.

And the spin prospects:

Cullen is very young and a spinner. Both rare commodities. Picked for Bangladesh, but struggled this season. Probably a decade from his best cricket.

White, as the only young leg-spinner around, must consider himself a chance, but like Watson, his batting is stronger than his bowling. Might come on, but lacks the unnatural spin or drift a top-class spinner would have.

With Symonds or Clarke in the side Australia has no need for a second spinner except on real turners (if Watson or Hopes were picked then this would depend on the venue). With that in mind, Warne is a certainty for as long as he plays, with MacGill as no more than backup. For those odd tests though, I'm inclined to go with Cullen, to give him a chance in favourable conditions, and because in three or four years he is likely to be our only real spin option.

The pace bowling lineup needs balance above all -- something lacking in the last Ashes debacle. If three bowlers are to be picked, then two need to be capable of bowling tight lines, to support Warne, each other and the more attacking third option. For the next two years, this means picking two of McGrath, Gillespie, Clark, and Bracken. For preference, one of the first two, and one of the second; although Gillespie is closer in age to the others I fear injury has taken a greater toll on his body. If three of them are unavailable then Johnson, Dorey or Griffith.

The attacking comes from Lee, who like the other hs probably four years in him at most. If form or injury intervenes then Johnson, then Tait. Lee is so smooth, injuries are rare, but Tait seems to be plagued by them, which means the likely long-term option is Johnson.

Form and injuries are variable, class remains unjudged, but some rough predictions can be made for the next two series that count. Likely substitutes for injury/form/retirement in (parenthesese), less likely backups in [brackets].

Ashes 2006-07
Too near now to predict the team will change much. Based on the South African performance, probably a reasonable lineup. More than anything there are players who can restrict and frustrate England's free scorers.
McGrath (Gillespie), Lee, Clark (Bracken), Warne [Cullen, Johnson]

Ashes 2009
Far enough away that its hard to predict, but not so far that names noone has heard of yet will emerge, unless they are genuine superstars. McGrath will be gone for sure, Gillespie and Clark almost certainly as well. Warne we can hope, but it might be in vain; perhaps he'll want 1000 wickets. Lee will be 33 by then as well, but should be there. I mark Tait down because I fear he'll never be accurate enough to survive at test level, but unless Dorey or Watson improve markedly he is the most likely successor to Lee. Note, however, the possibility that we'll have three left-armers, and none of last year's team.
Lee (Tait), Johnson (Watson), Bracken (Dorey), Warne (Cullen) [Griffith, White]

Update: I wrote this before the team to play Bangladesh was selected, and therefore before Australia's decision to pick five bowlers. I think this policy is a grave mistake. Bowlers may win games, but batsmen lose them. The advantage of having a fifth bolwer doesn't come near offsetting the extra runs of a sixth bat. And not just in raw averages. Batsmen rarely score their average, it is a skewed distribution with a few scores at the high end and lots of low ones. The more of them you have, the closer the team will get to its expected value -- ie. the more stable your totals will be and the less likely a collapse for not many will put you out of the game.

Cricket - Analysis 9th April, 2006 11:04:50   [#] [2 comments]

Ratings - April 2006 Russell Degnan

Opening Ratings: Ban: 610.86 Sri: 1067.61
1st Test: Sri Lanka by 8 wickets
2nd Test: Sri Lanka by 10 wickets
Closing Ratings: Ban: 601.18 Sri: 1074.84

Bangladesh are clearly improving, but without the results to prove it. Inconsistency and a weak bottom order is their current problem. A brilliant 136 by Mohammad Ashraful in the first test allowed them to hold Sri Lanka to a 19 run lead, but Muralitharan took over from there. The second test played out on similar lines for a similar result. The batting of Habibul Bashar, Mohammad Rafique and Ashraful is reasonable, but depending on those three to do the job is unrealistic. Unfortunately for Sri Lanka, they are heading in the same direction. Muralitharan continues to dominate their bowling, although he had reasonable support against weak Bangladeshi batsmen. Similarly, if you take out Tharanga's performances at the top the batting is suspect, although some good players were missing, there is not much depth. Sri Lanka are on the wane, Bangladesh on the up, but the ratings stay the same for now.

India v England
Opening Ratings: Ind: 1152.41 Eng: 1204.38
1st Test: Drawn
2nd Test: India by 9 wickets
3rd Test: England by 212 runs
Closing Ratings: Ind: 1142.25 Eng: 1213.19

A potential classic cut short. All three tests fluctuated, each a contest between bat and ball, but in the final balance England probably had the best of the series. India's win came on the back of a brilliant performance by Kumble, as the batsmen struggled with just one century and a heavy reliance on runs down the order. Their pitiful performance trying to save the series win in the final test showed the dangers of that, all out 100 in 48 overs. For England, Monty Panesar bowled well without wickets, but Hoggard defied the conditions to lead the wicket-takers. The batting was even and solid, except Bell, who continues to struggle against spin. The ratings remain unchanged, but India have some question marks over them.

New Zealand v West Indies
Opening Ratings: NZ: 1051.26 WI: 812.23
1st Test: New Zealand by 27 runs
2nd Test: New Zealand by 10 wickets
3rd Test: Drawn
Closing Ratings: NZ: 1050.90 WI: 812.76

The West Indies remain an enigma. After throwing away the first test after a 148 run opening stand chasing 263, Bond and Vettori's efforts aside, the second test went more to form, and recent history. The third went to the weather, but games in New Zealand are like that. With the possible exception of Gayle, noone on either side preformed particularly well, although two players performed particularly badly which cost their side the series: Lara and Chanderpaul. New Zealand were particularly uninspiring in winning these games, which might mean that the West Indies are slowly turning around, or it might mean that New Zealand are waning again. Either way, an uninspiring couple of games after a long summer.

South Africa v Australia
Opening Ratings: SAF: 1122.41 AUS: 1349.53
1st Test: Australia by 7 wickets
2nd Test: Australia by 112 runs
3rd Test: Australia by 2 wickets
Closing Ratings: SAF: 1084.75 AUS: 1377.72

A more convincing performance by Australia than in the home leg, winning all three matches, and only struggling in the last. The reasons for the victory were similar though: Australia's best four batsmen -- particularly Ponting and Hussey -- collectively scored almost more runs than South Africa's entire top order. South Africa's bowling was dependent on Ntini, and their batting on an average Kallis, while Australia had a firing Lee, Warne and the outstanding debut of Clark. There is a lot of mediocrity in the South African batting, and it is not clear any of the current side can play better than they are -- except Smith, who had a shocker, and Gibbs, who, like Sehwag, is finally being found out for their shoddy footwork. South African cricket needs to rebuild, because at the moment it is flailing and declining.

But what of Australia? I said the home series raised a lot of questions, and they haven't been answered satisfactorily. Symonds remains a mystery. It is worthwhile having one batsman who can change the game with blazing strokes and fast scoring, but with Gilchrist filling that role, and continuing to struggle, Symonds selection is an indulgence that puts pressure on Lee and Warne to score runs when they fail. Especially when his bowling, while useful for holding up an end, lacks any penetration. Martyn played alright, got out when he had starts in the first two tests, then scored a hundred without finishing the job in the last. He was probably the scapegoat for four players' failure in the Ashes, but questions remain over his temperament when the going is hard.

It is the bowling though with the most question-marks. It is not inconceivable that all four bowlers in this series won't be playing in two years, for form or age. That is a problem, but I'll expound on that in another post. In short, based on current news: no Bracken, bad; Gillespie, good; no Kasprowicz, good; Cullen, good.

Forthcoming Series:

Sri Lanka (1074.84) v Pakistan (1152.03) - 2 Tests.

A bit odd to comment on a series that will be over in about 5 hours. The rating say this will be close, although Sri Lanka appear to be declining, even as Pakistan improve again after a little lull. The actuality is a series full of drama, dominated by bowlers, rain, and the odd sparkling innings. It is still in the balance too. A pity it is so short.

Bangladesh (601.18) v Australia (1377.72) - 2 Tests.

Australia to win. Bangladesh have a history of over-performing against Australia. But barring rain, their lack of batting depth makes it hard to see how they'll even manage a draw.

South Africa (1084.75) v New Zealand (1050.90) - 3 Tests.

Unusually close in the ratings for the first time in a long time, South Africa need to win to arrest their relative slide down the rankings. New Zealand aren't going anywhere, haven't for some time, and lack any genuine match-winners, so it would be a suprise if South Africa didn't win. But on recent form this will be a mediocre series.

Zimbabwe (9th) 672.64

Cricket - Ratings - Test 5th April, 2006 16:55:22   [#] [4 comments]