From grabiner@math.harvard.edu Sat Feb 8 18:04:33 1992 Newsgroups: rec.sport.baseball Subject: How reliable are minor-league stats? (was: In defense of Doug Frobel) From: grabiner@math.harvard.edu (David Grabiner) Date: 8 Feb 92 18:00:34 EST Organization: Harvard University Dept. of Mathematics Summary: As good as major-league stats, except for strikeouts and walks Nntp-Posting-Host: zariski.harvard.edu In-reply-to: jimdean@bnr.ca's message of 5 Feb 92 16:13:37 GMT In article <1992Feb5.161337.13447@bnr.ca>, Jim Dean writes: > DG: David Grabiner > GH: Gary Huckabay >GH> Why should we? Minor league statistics do a fine job of predicting >GH> how well someone will hit in the majors. About as good a job as major >GH> league statistics. If you wait until someone PROVES they can play in >GH> the majors before you play them, you're losing a lot of productive >GH> player-years, and you're not going to be a long-term contender. I wrote something similar myself, and decided to check it out. I should now revise the claim: MLE's are just as good as major-league statistics in predicting future performance, with the exception of strikeouts and walks. >DG> I can confirm this. Bill James did MLE's for 30 rookies in the 1985 >DG> Baseball Abstract, and Frobel was the only player whose MLE was >DG> inconsistent with his actual performance; his MLE was a .282 average, >DG> and he hit .203. No other MLE was off by more than 48 points. (numbers corrected slightly from previous posting) I checked Bill James's sample; he looked at every rookie who had an MLE of at least 200 AB in 1983, and played for at least 200 AB in 1984. The average change in batting average was 25 points. For comparison, I looked at a sample of 30 players who had 200 AB in two consecutive seasons. I chose the 1988 and 1989 seasons because the 1990 STATS handbook was the reference I had at the time, and looked at the first 30 players in alphabetical order who met these standards. The average change in batting average was 24 points. There were a similar number of major errors. In the MLE sample, Doug Frobel lost 79 points of batting average, Ken Phelps lost 48 (but hit with more power and walks), and Bobby Meacham gained 42. In the major-league sample, Wally Backman lost 72 points, Jerry Browne gained 70, Greg Brock gained 53, and Kevin Bass gained 45. I defined a major error in the approximations as a number which was two standard deviations away from the predicted value, excluding very small values such as a prediction of one homer for a player who actually hit four. (In such cases, the statistical assumptions break down.) A perfect prediction method would still make one major error for every 20 predictions, just because of normal statistical fluctuation. If a player's abilities did not change from one year to the next, using one year to predict the next year would make one major error for every six predictions. There were 21 major errors in the major-league data, out of 138 meaningful projections (average, doubles, walks, and strikeouts for everyone, and homers for 18 of the 30 players). There was also one error in a projection of a small number; Marty Barrett hit one homer in 1988 and 11 in 1989. There were 29 major errors in the MLE data, out of 137 meaningful projections, and one error in a projection which wasn't likely to be meaningful; Juan Samuel projected to hit 8 triples and hit 19. However, 11 of 30 MLE projections for walks, and 9 of 30 for strikeouts, were off; that's 1/3 of all the projections. (6 of 30 major-league walk predictiions and 5 of 30 strikeout predictions were off, about what would be expected.) The errors were not consistently in either direction. This suggests that MLE's are not good for projecting strikeouts and walks, probably because of differences in minor-league pitching. Everything else projects just as well from MLE's as from major-league data. And as for the inconsistency of Doug Frobel's MLE with his performance (and also Ken Phelps, whose season was as good as his MLE's, but with more walks, fewer strikeouts, fewer doubles and a lower average), Backman, Brock, and Browne all had two consecutive seasons which don't look like they belong to the same player. As an interesting side note, Buechele was responsible for two of the major errors in the major-league sample; he fell from 65 walks to 36, and increased from 79 strikeouts to 107. It could happen again. -- David Grabiner, grabiner@zariski.harvard.edu "We are sorry, but the number you have dialed is imaginary." "Please rotate your phone 90 degrees and try again." Disclaimer: I speak for no one and no one speaks for me.