Line Drive % Possible Error : Minnesota Twins 2006
Â
Â
Â
I was throwing around different sabermetric stats and their validity, I came to the next entry on my list, Line Drive Percentage (LD%). When a batted ball is hit, it falls into one of three categories, groundball, line drive or fly ball. Seems fairly reasonable.
Â
Â
So I headed over to the Hardball times and decided to look at some of the team LD% stats from 2006. Everything seemed in line except one team’s stats, the Minnesota Twins. The AL league average for LD% is 19.7% (or .197), but the Twins had a high 21.4%. The second best? The White Sox, at 20.1. What made the Twins have a much higher LD%?Â
Â
Let’s look at a couple of stats. Over the past year, some of the authors at the Hardball Times have concluded that LD% + .120 roughly equals BABIP. In checking the 2006 league stats, the AL LD% is .197 and the BABIP is .308, a 1.11 differential. The 06 NL LD% is .200 and the BABIP is .301, for again a 1.11 differential. All’s well so far. Now back to the 06 Twins, registering a .214 LD% and a .319 BABIP (tied league high), for a 1.05 differential, which is fairly substantial difference. Is something amiss?Â
The Twins and their strong overall pitching staff threw up a league average 20% LD. The Twins led the AL in fewest runs allowed per game, with 4.22.Â
When looking at the individual LD% leaders, one will find predominantly doubles hitters near the top of the average. This makes sense, as a hard line drive to the outfield has a good probability of going for extra bases. The 2006 top five individual LD% leaders, and their doubles & HR totals.Â
- Freddy Sanchez: (53 doubles, 6 HR)
- Mark Loretta: (33 doubles, 5 HR)
- Adam Kennedy: (26 doubles, 4 HR)
- Twin Joe Mauer: (36 Doubles, 13 HR)
- Michael Young: (52 Doubles, 14 HR)
Â
It appears that doubles do in-fact have a correlation with LD%. That should answer that then; the Twins must have hit a ton of doubles.Â
Â
Â
- Top: 357
- Bottom: 266
- Average: 307
- Twins: 275
Â
Huh? Maybe these line drives are going for triples
Â
- Top: 40
- Bottom: 16
- Average: 28
- Twins: 34
Â
Â
Just six over league average, certainly not enough to carry the LD%. HR’s? Nope, the Twins were second last with only 143. And we already know that the teams BABIP was tied for the league high at .319, so defenses weren’t taking away inordinately large amounts of hits and more importantly doubles.
Â
Â
Â
Â
Â
What exactly going on then? Most stat services will use official game data from the MLB scorer and translate it into their statistics. Some services will assign someone in-house to watch a team’s games and enter the data. I’m not sure which way Baseball Info Solutions (suppliers of LD% stats) runs their ship, but something is going on and a data tracker somewhere appears to be line drive happy. And until this is solved, I personally may take line drive statistics with a grain of salt.
Â
If you have the answer, please share it with us in the comment forum below.
I believe the Twins had a lot of line-drive singles hitters last year:
Punto 23.6 LD%
Tyner 27.0 LD%
Nevin 25.0 LD%
White 21.3 LD%
Bartlett 22.2 LD%
I can’t post a chat here, but I’ll have a link to an article I’m writing on The Pastime when it’s done. I think it sill shed some light on the problem of the Twins LD%.
Whoops, that was supposed to be “I can’t post a chart” not “a chat”.
But is the reason their high LD% hitters due to an error in scoring? Hitters that have played elsewhere:
Nevin was never close to 25% anywhere else in his career. He played everywhere in 2006, but posted a 22.7% in Chicago, a 21.9% in Texas and a 25% in Minnesota. In 2005 he had a 20.6% in SD, and a 15.9% in Texas.
Rondell White went from 16.7% in Detroit in 2005 to 21.3% in Minnesota in 2006.
Â
I must add that both of these players are in baseball’s “decline” age group, which makes the rise extra puzzling. One would assume that their bats would slow down, and with that bring a drop in LD%, not huge gains.
Also, Tyner is a small sample, but he had an 18% rate in 2005 & then a 27% rate in 2006. Something appears to be going on with the 2006 scoring.
Punto: 21.2% in 2005, 23.6% in 2006
Bartlett 18.2% in 2005, 22.2% in 2006
Â
That’s all five from your list. And all five had pretty decent gains in LD% in 2006. This is either A) a really odd coincidence or B) an error in scoring.
Baseball Info Solutions has a link on their site advertising jobs for outsourced fans. You have to show up, mark down everything on their sheet, then upload it when you get home. I’m not sure if they do this with the MLB, but they may have a new data keeper that’s LD happy.
I don’t think the scorers are to blame here. If you look at players who’ve been with the Twins for more than a year, such as Lew Ford (17.1, 16.0, 16.4), Luis Rodriguez (20.7, 19.2), Michael Cuddyer (18.6, 17.9, 20.6), and Torii Hunter (14.8, 14.3, 18.0).
Cuddyer and Hunter’s increases are mild, and coincide with career years. It’s hard to find enough evidence that the scorer is to blame. The difference in moving to a new home ballpark may be enough.
Here’s a link to the numbers and analysis I’m basing my opinion off of:
http://thepastime.net/2007/02/08/the-line-drive-twins/
But how many players had a decline on the entire roster?
Â
Â
Â
Of the 13 players looked at with a decent number of AB’s in 2006, 11 had an increase in LD%, with only 2 declines. And Batista, Cuddyer, Nevin, Morneau, White, Bartlett, & Hunter all had pretty signifigant increases in their ratios. Keep in mind, about half of this list would be considered power hitters, going somewhat against the LD single theory, although they still hit a high number of singles themselves. Â
The players with the highest slugging percentages, In order: (Morneau, Mauer, Cuddyer, Hunter) all had increases in the LD%. Is the scorer handing out LD hits for hard hit balls to the outfield?
BIS uses in-house data analysts watching video for the ML games. It’s that same small amount of guys for all teams.
Fans only do the minor league games.
Couple problems with the argument …
1) Twins’ margin over 2nd AL team is overstated, since the #2 team to the twins was not Chicago at .201 but Cleveland at .207. [White Sox were 3rd highest.]
2) the “missing doubles” are missing because in addition to their above average LD% [.214 vs lg avg of .197], the twins were also 2nd in GB% and most importantly, last in FB% [.32 vs league avg of .37]. Line drives turn into doubles at a significantly higher rate than fly balls do, but overall about half of doubles come on line drives and about half on fly balls. So the extra doubles expected on the extra line drives would be offset by the doubles lost on fewer fly balls. The hardball times annual shows the run value of the twins LD as .38 runs per LD, vs the league average of .39, so there isn’t much reason to think that the twins had an unusual skew of outcomes on the line drives they did hit.
In sum, no reason to suspect bias from the scorers.
Thanks for posting. Now here’s my rebuttal.
1. FB % will invariably be lower due to the high number of LD. This could happen due to FB being incorrectly scored as LD.
I’ve yet to see a stat outlining FB to Double percentage, could you link us something?
2. The GB% is very high. But a GB is easier to score. Also, a GB down the line or even hardly hit threw a hole can also result in
a double, just as a fly ball. Obviously this % will be low, but it would be interesting to see.
3. You are right on the LD of Cleveland. I shouldn’t have missed that. But it raises an interesting point in my eyes. With the current scheduling of the MLB, the scorer reasonable for the Twins should also do a high percentage of AL Central games.
LD AL Central%
Twins: .214
Indians: .207
White Sox. 201
KC: .198
DET: .191
Outside of the Tigers, the other teams are all above AL average. The top 3 LD% scores in all of baseball in 2006 were .214, .207, .201. All AL Central Teams, all most likely scored in a number of games by the same scorer that covers the Twins. That seems like a VERY odd coincidence. Maybe the AL central is king of the LD. Or maybe somethings up…
OK, here’s my response to your rebuttal:
1) As you note, if line drives are being misclassified, fly balls will necessarily decrease, but the twins LD (.214) plus Fb (.316) percentages = .530 which is below league average (.197 + ~.36) ~=.557. Since fly balls also strongly correlate with doubles (though not as strongly as I said originally-see below), your argument against the twins’ line drive percentage on the basis of missing doubles doesn’t hold. Note also that the Twins home park seems to have suppressed doubles the last few years. The Bill James Handbook gives a park factor of 95 for doubles. This further explains the “low” doubles total.
One discussion of batted ball outcomes is here with further links. I was in error when I said 1/2 of doubles came on fly balls; it’s actually about 35%. (I was correct when I said that half of doubles came on line drives.)
You can find fairly similar results from different datasets; in the 2006 Hardball times annual for example, where double rates of .019 per GB, .061 per OF Fly, .183 per LD were published in an article called “What’s a batted ball worth” by Dave Studenmund, using BIS data from 2002-2005.
2) about 2% of GB become doubles; they are almost all on balls down the line. This is somewhat park dependent [ these balls mostly spin into foul territory so the size and shape of Of foul territory matters]. Note that there can be discrepancies on coding a GB vs LD as well. Is a one-hopper through the infield which lands on the dirt but passes the fielder’s actual position before it bounces a line drive or ground ball? Differing services may differ in their definition … They probably all call a ball caught knee high by the 3rd baseman playing in a line drive, whereas the same ball would be a one-hopper and a grounder if he’s playing deep. But which is it if it goes through for a hit?
3) You are still not looking at the HBT LD% carefully enough. It is false that the top 3 teams in baseball are from the AL Central. For some reason the HBT gives the NL figures to 2 decimal places whereas the AL is to 3 places. Colorado at 21% is definitely part of the top 3. There are also seven NL teams with LD% rounding to 20. (6 would in the AL,topped by the white sox). Some NL team(s) besides Colorado may also exceed the white sox.
Furthermore, the AL Central is above average offensively in the league in which pitchers rarely bat. The Central teams were 2nd,3rd,5th,8th and 12th in AL in runs scored. So again with 3 of the top 5 and 4 of the top 8 it’s not intrinsically suspicious that teams in the division would also dominate the LD% list. You need to make a more careful argument than calling it a very odd coincidence as if the LD% just should be random.
The LD% percentage you are critiquing is from BIS scorers with their definition of a LD; they score line drives more generously than other services such as STATS or MLB or the source used by retrosheet. BIS supposedly uses a pool of scorers who are not tied to individual teams, so it shouldn’t be exposed to the kind of systematic error which you hypothesized.
Nonetheless, consultation of retrosheet’s version of balls in play does bring the twins well back toward the middle of the pack, which supports your view that something could be going on with the Twins LD%. Constructing Retrosheet LD% the same way HBT does, the league leader would be Toronto at .200 vs the league average of .190, with the twins 6th at .192. There are some problems with the retrosheet data (they score every SF as a fly, never as a LD, which is about 10% of their difference with BIS, and there are a handful of plays missing hit types, but this would have limited impact on teams’ relative order. Here are AL LD% based on retrosheet data:
TOR .200
CLE .199
BAL .197
BOS .197
OAK .195
MIN .192
TEX .191
KC .190
NYY .187
DET .186
SEA .185
CHW .181
TB .178
LAA .177
I am not saying that Retrosheet is right and BIS is wrong about the Twins, just indicating that they do disagree about the twins fairly significantly and this is unlikely to be
explicable just in terms of their differing definitions of what a line drive is. BIS gets points for consistency from me because it least when it comes to sacrifice flies, they seem to generally stick to their standard for coding a line drive, whereas retrosheet does not and STATS does not appear to either. [This is based on my personal review of video cross referenced against STATS and BIS game logs.] But I can’t say one way or the other whether BIS is better in general on being consistent with balls which are on their borderline between linedrives and fly balls…
I have to believe that with BIS won’t assign games by teams, but perhaps by division. And, with the Twins, Indians, and Whitesox all 1-2-3 in the AL, that would give me reason to pause.
I would hope that they would assign games randomly, since introducing a potential bias like this is crazy. I can understand if they were locally physically present. But, off video, the parks and teams should be split up.
Thanks again for the reply guys.
I did miss the Rockies stats. I’ll take the time to go through the list fully this time.
I’ll leave the other stuff alone for now, but bringing up the retrosheet stats is interesting, along with teams/division/time zone questions.
How up to date is BIS info. Does it come right after a game, or is it watched on tape at a later time? Games or viewers may be assigned a time zone. Or even a division. I’m going to compare the BIS LD% data with the retro sheet LD data. I’ll use the BIS in the first column, than show the retro stat. I’ll then subtract, and after that put in a percentage value of differential. Note the large individual and overall differences in data for the AL central group and the rest of the AL.
Â
Â
The rest of the AL has an average descrepancy of 1.87%. The lowest discrepancy in the AL central is 2.68 and the overall difference between BIS and Retro scores are 6.6%. This value is of course carried by the Twins, but the White Sox have the second highest value (11%), and the Royals & Indians would rank 6th & 7th, both over 4% differential. Even if the retrosheet data is to be called into question, there is a large spread between each systems AL Central LD data.
Min was .192 not .182 per retrosheet data
Your right. It seems like I have to screw up at least one easy number each post:) I fixed it above, and the Twins number is now closer to the White Sox. But the whole AL Central is still carrying a large spread over the rest of the AL.
love it
This is great – thanks for following up.
Fundamentally the Twins and White Sox look like outliers, the other 3 central teams look “normal” – but I am amazed that Retrosheet actually had higher counts than BIS for Baltimore and Oakland.
We don’t know whether the “bad scoring” is on the retrosheet side or the BIS side or a mix of both. There are at least 2 other independent sources which could be pulled into this. mlb.com has files containing pbp data by game accessible on the web; there is some corruption and ~3% of their plays need to be “scrubbed” – usually by looking at video. STATS has ground ball fly and ball counts available at the team level, as well as in the game logs for individual pitchers. I don’t have scrubbed mlb.com data for 2006, but here are the other 3 for Minnesota for 2006:
BIS STATS Retro
G 2217 2243 2254
F 1493 1494 1562
L 1009 ???? 0907
B 0097 ???? 0095
So it appears as though I have a minor error in my Retrosheet query (counting dropped foul popups or something like that) because BIS adds to the correct total of 4816 but Retrosheet adds to 4818. At the aggregate level, BIS and STATS agree more on fly balls + popups than they do on ground balls! [n.b my previous comment.] And somehow sources fail to agree completely on such seemingly unquestionable events as bunts. [STATS derived information on league leaders in bunts in play can be found; they have Juan Pierre at 52 bunts in play for 2006, whereas BIS has 49. Retrosheet also has him with 49, with 2 more bunt-foul strikeouts.]
As far as I know there are no public details about how BIS scorers are assigned games, or how much cross-checking takes place. I think we know however that they score from video, whereas STATS depends on scorers at the ballpark. My own experience trying to chart hit locations from video is that it’s particularly hard on medium-depth balls hit to the oufield, because there are no obvious landmarks in view from the infield or fence as the camara zooms in on the descending ball. So I can imagine that BIS might end up having scorers who specialize by park (as opposed to by team) -this might enable them to do a better job locating these kinds of balls…
you guys are pretty crazy heh