I admit that the title of this post is a bit of a misnomer. Some – not all – sabermetric stats are being used/interpreted in a way that makes me uncomfortable. Let me explain.
“There’s a man in Mobile who remembers that Honus Wagner hit a triple in Pittsburgh 46 years ago. That’s baseball. So is the scout reporting that a 16-year-old pitcher in Cheyenne is a coming Walter Johnson. Baseball is a spirited race of man against man, reflex against reflex. A game of inches. Every skill is measured. Every heroic, every failing is seen and cheered, or booed. And then it becomes a statistic.” – Ernie Harwell
For the casual fan who may not know the background of sabermetric statistics, allow me to give you a quick background. Sabermetrics is derived from the acronym SABR, which stands for the Society of American Baseball Research. It was created by Bill James. From Joe Posnanski’s What Keeps the Game Great:
(Bill James) began his quixotic writer’s life in the 1970s as a security guard at the Stokely-Van Camp cannery in Lawrence, Kans. (I’ve often imagined that Bill protected the pork from the beans.) There he would pore over box scores clipped out of The Sporting News and try to figure out baseball. He was intensely interested in what was real about the game.
Bill James defined sabermetrics as “the search for objective knowledge about baseball.” Sabermetrics is the analysis of baseball through objective, empirical evidence, especially baseball statistics that measure in-game activity. Thanks to James’ influence, front offices and fans alike now view the game of baseball differently. To me, the biggest effect that James has had on the game of baseball was minimizing the importance of batting average and maximizing the importance of on-base percentage. It’s not as important how many times a hitter reaches base on a batted ball in play as much as it is whether he makes an out or not.
When it comes to defense – before sabermetrics arrived on the scene – a player’s defense was evaluated by a statistic known as fielding percentage (put outs + assists / total number of chances). In other words, a player was deemed to be a defensive stud if he made the least amount of errors. James recognized that even if one were to place a pillar of salt at shortstop, that pillar of salt would not make many errors – yet it wouldn’t make any plays either. The real defensive stud was the player who could get to the balls that other players couldn’t. Range factor (putouts + assists / # of innings at a position) and Ultimate Zone Rating (UZR) were created to attempt to evaluate these competencies.
I don’t ever like to play the “I played the game so I know” card, because not having played the game doesn’t preclude one from being able to understand and critique it. I also don’t like to be painted with the “old-school dinosaur” brush, since I believe that sabermetric statistics do have value and I am all for continuous improvement. That said, I come from the game-playing perspective and I can tell you there is a lot more going on than the number of groundballs hit to a player divided by number of innings. Great progress has been made thanks to the efforts of Bill James, such as recognizing that RBI and Save stats are useless. But some sabermetric stats make me uncomfortable. They make me uncomfortable because often they are used on their own and perceived to be the final word in the assessment of a player despite being flawed. These are them:
Range Factor
Range Factor (commonly abbreviated as RF) is a baseball statistic developed by Bill James. It is calculated by dividing putouts and assists by number of innings or games played at a given defense position. The statistic is premised on the notion that the total number of outs that a player participates in is more relevant in evaluating his defensive play than the percentage of cleanly handled chances as calculated by the conventional statistic fielding percentage.
– Baseball Reference
I wholeheartedly agree with the fact that the total number of outs that a player participates in is more relevant in evaluating his defense than the percentage of cleanly handled chances. That said… it is important to recognize that it is not always up to the player to put himself in a position to participate in an out.
You can spend your time poring over boxscores all you like, but they don’t always tell the whole story. For example, during my playing career I played for a coach who was obsessed with doubles defense and would often times employ this defensive strategy in situations that didn’t call for it in a traditional sense. If you don’t know, doubles defense is when the outfielders position themselves deep so balls don’t go over their heads and the corner infielders play close to the line to prevent ground balls hit down the foul lines for doubles. It is the coach/manager’s decision when to employ this strategy and he will instruct the players where to position themselves between plays. As a pitcher, I would get hot under the collar as I would often times execute a sinking fastball to induce a ground ball and the ball would roll harmlessly through the infield – where the third baseman should have been positioned – for a base hit. Because of the coach’s decision to employ doubles defense, the third baseman’s Range Factor is negatively affected. Is it his fault? No. The same argument can be made for teams that employ the shift defense on dead-pull hitters. That is one flaw in this statistic.
Another flaw is that the statistic can be skewed based on the type of pitching staff a team employs. A team that plays in a bandbox ballpark might choose to employ a groundball pitching staff. As a result, the infield will have more chances to pad their stats with putouts and assists while outfielders will be unfairly penalized by having less chances. Conversely, a cavernous ballpark like Petco Park will be able to get away with having a flyball pitching staff, therefore the outfielders will have more chances and the infielders will have less. Although it is a far better statistic than fielding percentage, does Range Factor really tell the whole story of a player’s defensive prowess in relation to other players in the league? I don’t think so.
Ultimate Zone Rating (UZR, UZR/150)
UZR is an advanced defensive metric that uses play-by-play data recorded by Baseball Info Solutions (BIS) to estimate each fielder’s defensive contribution in theoretical runs above or below an average fielder at his position in that player’s league and year. Thus, a SS with a UZR of zero is exactly average as compared to a SS in the same year and in the same league. If his UZR is plus, he is above average, and if it is minus, he is below average.
–Fangraphs.com
For more info on UZR, click on the link above. Apart from the uncomfortable fact that UZR relies on a computer to parse through 6 years of footage to determine whether a play made is reasonable or not, UZR has two notable flaws. Firstly, UZR is subject to the same limitations as Range Factor: defensive positioning. In addition to that limitation, UZR is a stat that is generally accepted to be effective only when analyzed in the context of a 3-year period due to sample size concerns. A stat that needs 3 years of data to be effective is useless to me, mostly because player performance can fluctuate wildly from season to season. For example, Alex Rodriguez of the New York Yankees. A-Rod played the entire 2008 season at third base with a torn labrum in his hip which severely limited his range of motion… as well as range in the field. In March of 2009 elected to have surgery on his hip to remedy the problem. Following the surgery, A-Rod talked about how he now had better range of motion in his hip than had his hip never been torn in the first place.
In this instance you have a player who plays at an above-average level of defense, becomes injured, plays at a low level of defense, has surgery, recovers and possibly plays at a higher level than before he got hurt – all over a 3 year span. Is UZR going to be able to quantify how good of a defensive player A-Rod really is? I don’t think so.
Another example that is applicable to this situation is when a player is learning a new position or even comes up from the minor leagues. For instance, I remember when Orlando Hudson was called up from the minor leagues. Upon his arrival in Toronto, his defensive game was flashy, yet a little rough around the edges. Under the tutelage of infield coach Brian Butterfield, Hudson’s defense improved by leaps and bounds. Now let’s say I am looking at evaluating Hudson’s defense in his 4th year as a major leaguer and I am looking at UZR. Does it really matter to me what his defense was like 3 years ago? Is it representative of the player he is today? Not in the least.
As former professional baseball Morgan Ensberg says:
The term “Garbage in, Garbage out” is the most accurate description I can give. If the sample used is garbage, then the answers won’t be accurate.
Sabermetrics requires accurate information or organizations may misinterpret the data.
In other words, in the cases of Range Factor and UZR, the underlying data that feeds the systems is subjective, not objective, and prone to varying kinds of bias.
OPS
When I first heard about OPS it became one of my favourite statistics. OPS is the acronym for on-base percentage + slugging percentage (On-base Plus Slugging). After looking at the league leaders in OPS (all of them being studs) I hastily came to the conlusion that OPS was a sure-fire way to measure studliness. To put it into perspective, here are baseball’s all-time leaders in career OPS:
1. Babe Ruth, 1.1638
2. Ted Williams, 1.1155
3. Lou Gehrig, 1.0798
4. Barry Bonds, 1.0512
5. Albert Pujols, 1.0511
Since then, OPS has quickly permeated baseball consciousness and has crept onto some ballpark scoreboards as a go-to stat when judging a player’s offensive performance.
It is generally accepted that for a player to be a stud, his OPS should be greater than .850 or .900. I began to think about players who are studs without relying on the power game to see how their OPS levels shook out. Namely, Ichiro Suzuki.
Suzuki, an MVP award winner, a Rookie-of-the-Year award winner and 10-time All-Star, has a career OPS of .793. In fact, not once in his career has Ichiro ever sniffed an OPS of .900. Not too impressive is it? Especially for a player, in my mind, who is a future Hall-of-Famer. Admittedly, Ichiro’s production has taken a nosedive this season, however, a few years ago many GMs would jump at the chance to build their franchise around the Japanese star. What’s the issue then? The fault of OPS is that it weighs on-base average and slugging percentage equally, although on-base average correlates better with scoring runs – which is more important – and coincidentally scoring runs is Ichiro Suzuki’s game. Slugging percentage also tends to be 75 to 100 points higher, on average, than on base percentage. Suzuki is unfairly penalized by this statistic as a result, as are all players who’s game is to get on base and create run-scoring opportunities. Unlike useless UZR, OPS does have value but care needs to be taken not to place too much emphasis on the stat on its own as it does not judge all types of players equally.
Wins Above Replacement (WAR)
WAR is a single number used to present the number of wins a player added to a team as opposed to a mythical “replacement player” – think a AAA call-up of major league journeyman. Firstly, there is no consistent formula established to calculate WAR, though Fangraphs, Baseball Reference and Baseball Prospectus’ versions are the most popular ones. Although this isn’t really a concern of mine, the concern I do have is with how the stat is calculated. Since it uses UZR, which needs 3 years of data for a large enough sample size, WAR, by definition, is inaccurate.
Secondly, the way WAR is calculated for pitchers and positional players is far different yet many compare a positional player’s WAR to a pitcher’s WAR as if they were equal. They are not. This is a flaw.
Lastly, the problem I have with WAR is semantics – I’m not comfortable with the fact that “wins” can be assigned to an individual player in a team game such as baseball.
BABIP
“Keep your eye clear, and hit ’em where they ain’t” – Wee Willie Keeler
Part of most everybody’s method of hitter evaluation includes a look at the player’s BABIP. The acronym stands for Batting Average on Balls In Play – a statistic that tries to answer whether a hitter is lucky or unlucky. Players with a BABIP lower than the league average ( ~.300) are considered to be unlucky and should theoretically expect their performance to improve. Those who have a BABIP higher than the league average should reasonably expect their performance to come back down to earth, or so the prevailing wisdom goes.
However, some hitters are more dependant on BABIP than others. Remember, the stat measures balls in play, so a fly-ball hitting slugger will hit less balls in play and not be subject to the statistic as much as a slap hitter like an Ichiro Suzuki or… Wee Willie Keeler. I would argue that placing balls where players are not positioned is an actual skill and not a by-product of luck. That’s why this statistic makes me uncomfortable.
Skill-Interactive Earned Run Average (SIERA)
Skill-Interactive Earned Run Average estimates ERA through walk rate, strikeout rate and ground ball rate, eliminating the effects of park, defense and luck. How is it calculated you ask?
SIERA = 6.145 – 16.986*(SO/PA) + 11.434*(BB/PA) – 1.858*((GB-FB-PU)/PA) + 7.653*((SO/PA)^2) +/– 6.664*(((GB-FB-PU)/PA)^2) + 10.130*(SO/PA)*((GB-FB-PU)/PA) – 5.195*(BB/PA)*((GB-FB-PU)/PA)
The statistic is so convoluted that Baseball Prospectus removed SIERA from their toolbox of stats. That’s good enough for me.
All of the statistics listed above make me uncomfortable when used on their own. When analyzing a player I like to couple conventional statistics with sabermetric ones. A stat like OPS will not provide total clarity, however, when looked at alongside conventional stats such as hits, home runs, bases-on-balls and stolen bases, a clearer picture of a player’s production begins to form. When looking at ERA I’ll also check out FIP while also considering innings pitched, strikeouts, GB/FB ratio, K:BB ratio, hits, home runs, walks and intentional walks. Range factor is better when paired with a statistic like fielding percentage. While a player may be able to get to balls on the periphery, is he able to convert them into outs?
Ideally, what I want from my statistics are black and white outcomes. That’s what I love about OBP – either the player (edit) made an out ot he didn’t. There’s no sample size error, no assumption of universal causation, no convoluted formula. Putting statistics into a greater context allows us to fully flesh out what a player brings to the table. I recognize that the gripes I present about the stats listed above may be perceived by some as minor. As I said before, I realize that the stats I am uncomfortable with still have value, however, I want accuracy from my stats – not flakiness. It is when these stats are viewed as the “be all and end all” that makes me uncomfortable.
Vin Scully said it best when it comes to evaluating players using statistics:
Statistics are used much like a drunk uses a lamppost: for support, not illumination.
– Vin Scully
Until the day Sabermetrics can give me 100% accuracy, I will continue to be uncomfortable with how they are used on their own.
Is my thinking flawed? Do I have a misunderstanding of one or more sabermetric stats? Leave a note in the comment section to try to convince me otherwise.
Images are courtesy of screencaps of the Fox classic sitcom, “The Simpsons, ” Keeping Our Finger on the Pulse, Brandon Julien, and the Associated Press.
Related
The Use Of Sabermetric Stats Makes Me Uncomfortable
I admit that the title of this post is a bit of a misnomer. Some – not all – sabermetric stats are being used/interpreted in a way that makes me uncomfortable. Let me explain.
“There’s a man in Mobile who remembers that Honus Wagner hit a triple in Pittsburgh 46 years ago. That’s baseball. So is the scout reporting that a 16-year-old pitcher in Cheyenne is a coming Walter Johnson. Baseball is a spirited race of man against man, reflex against reflex. A game of inches. Every skill is measured. Every heroic, every failing is seen and cheered, or booed. And then it becomes a statistic.” – Ernie Harwell
For the casual fan who may not know the background of sabermetric statistics, allow me to give you a quick background. Sabermetrics is derived from the acronym SABR, which stands for the Society of American Baseball Research. It was created by Bill James. From Joe Posnanski’s What Keeps the Game Great:
Bill James defined sabermetrics as “the search for objective knowledge about baseball.” Sabermetrics is the analysis of baseball through objective, empirical evidence, especially baseball statistics that measure in-game activity. Thanks to James’ influence, front offices and fans alike now view the game of baseball differently. To me, the biggest effect that James has had on the game of baseball was minimizing the importance of batting average and maximizing the importance of on-base percentage. It’s not as important how many times a hitter reaches base on a batted ball in play as much as it is whether he makes an out or not.
When it comes to defense – before sabermetrics arrived on the scene – a player’s defense was evaluated by a statistic known as fielding percentage (put outs + assists / total number of chances). In other words, a player was deemed to be a defensive stud if he made the least amount of errors. James recognized that even if one were to place a pillar of salt at shortstop, that pillar of salt would not make many errors – yet it wouldn’t make any plays either. The real defensive stud was the player who could get to the balls that other players couldn’t. Range factor (putouts + assists / # of innings at a position) and Ultimate Zone Rating (UZR) were created to attempt to evaluate these competencies.
I don’t ever like to play the “I played the game so I know” card, because not having played the game doesn’t preclude one from being able to understand and critique it. I also don’t like to be painted with the “old-school dinosaur” brush, since I believe that sabermetric statistics do have value and I am all for continuous improvement. That said, I come from the game-playing perspective and I can tell you there is a lot more going on than the number of groundballs hit to a player divided by number of innings. Great progress has been made thanks to the efforts of Bill James, such as recognizing that RBI and Save stats are useless. But some sabermetric stats make me uncomfortable. They make me uncomfortable because often they are used on their own and perceived to be the final word in the assessment of a player despite being flawed. These are them:
Range Factor
I wholeheartedly agree with the fact that the total number of outs that a player participates in is more relevant in evaluating his defense than the percentage of cleanly handled chances. That said… it is important to recognize that it is not always up to the player to put himself in a position to participate in an out.
You can spend your time poring over boxscores all you like, but they don’t always tell the whole story. For example, during my playing career I played for a coach who was obsessed with doubles defense and would often times employ this defensive strategy in situations that didn’t call for it in a traditional sense. If you don’t know, doubles defense is when the outfielders position themselves deep so balls don’t go over their heads and the corner infielders play close to the line to prevent ground balls hit down the foul lines for doubles. It is the coach/manager’s decision when to employ this strategy and he will instruct the players where to position themselves between plays. As a pitcher, I would get hot under the collar as I would often times execute a sinking fastball to induce a ground ball and the ball would roll harmlessly through the infield – where the third baseman should have been positioned – for a base hit. Because of the coach’s decision to employ doubles defense, the third baseman’s Range Factor is negatively affected. Is it his fault? No. The same argument can be made for teams that employ the shift defense on dead-pull hitters. That is one flaw in this statistic.
Another flaw is that the statistic can be skewed based on the type of pitching staff a team employs. A team that plays in a bandbox ballpark might choose to employ a groundball pitching staff. As a result, the infield will have more chances to pad their stats with putouts and assists while outfielders will be unfairly penalized by having less chances. Conversely, a cavernous ballpark like Petco Park will be able to get away with having a flyball pitching staff, therefore the outfielders will have more chances and the infielders will have less. Although it is a far better statistic than fielding percentage, does Range Factor really tell the whole story of a player’s defensive prowess in relation to other players in the league? I don’t think so.
Ultimate Zone Rating (UZR, UZR/150)
For more info on UZR, click on the link above. Apart from the uncomfortable fact that UZR relies on a computer to parse through 6 years of footage to determine whether a play made is reasonable or not, UZR has two notable flaws. Firstly, UZR is subject to the same limitations as Range Factor: defensive positioning. In addition to that limitation, UZR is a stat that is generally accepted to be effective only when analyzed in the context of a 3-year period due to sample size concerns. A stat that needs 3 years of data to be effective is useless to me, mostly because player performance can fluctuate wildly from season to season. For example, Alex Rodriguez of the New York Yankees. A-Rod played the entire 2008 season at third base with a torn labrum in his hip which severely limited his range of motion… as well as range in the field. In March of 2009 elected to have surgery on his hip to remedy the problem. Following the surgery, A-Rod talked about how he now had better range of motion in his hip than had his hip never been torn in the first place.
In this instance you have a player who plays at an above-average level of defense, becomes injured, plays at a low level of defense, has surgery, recovers and possibly plays at a higher level than before he got hurt – all over a 3 year span. Is UZR going to be able to quantify how good of a defensive player A-Rod really is? I don’t think so.
Another example that is applicable to this situation is when a player is learning a new position or even comes up from the minor leagues. For instance, I remember when Orlando Hudson was called up from the minor leagues. Upon his arrival in Toronto, his defensive game was flashy, yet a little rough around the edges. Under the tutelage of infield coach Brian Butterfield, Hudson’s defense improved by leaps and bounds. Now let’s say I am looking at evaluating Hudson’s defense in his 4th year as a major leaguer and I am looking at UZR. Does it really matter to me what his defense was like 3 years ago? Is it representative of the player he is today? Not in the least.
As former professional baseball Morgan Ensberg says:
In other words, in the cases of Range Factor and UZR, the underlying data that feeds the systems is subjective, not objective, and prone to varying kinds of bias.
OPS
When I first heard about OPS it became one of my favourite statistics. OPS is the acronym for on-base percentage + slugging percentage (On-base Plus Slugging). After looking at the league leaders in OPS (all of them being studs) I hastily came to the conlusion that OPS was a sure-fire way to measure studliness. To put it into perspective, here are baseball’s all-time leaders in career OPS:
1. Babe Ruth, 1.1638
2. Ted Williams, 1.1155
3. Lou Gehrig, 1.0798
4. Barry Bonds, 1.0512
5. Albert Pujols, 1.0511
Since then, OPS has quickly permeated baseball consciousness and has crept onto some ballpark scoreboards as a go-to stat when judging a player’s offensive performance.
It is generally accepted that for a player to be a stud, his OPS should be greater than .850 or .900. I began to think about players who are studs without relying on the power game to see how their OPS levels shook out. Namely, Ichiro Suzuki.
Suzuki, an MVP award winner, a Rookie-of-the-Year award winner and 10-time All-Star, has a career OPS of .793. In fact, not once in his career has Ichiro ever sniffed an OPS of .900. Not too impressive is it? Especially for a player, in my mind, who is a future Hall-of-Famer. Admittedly, Ichiro’s production has taken a nosedive this season, however, a few years ago many GMs would jump at the chance to build their franchise around the Japanese star. What’s the issue then? The fault of OPS is that it weighs on-base average and slugging percentage equally, although on-base average correlates better with scoring runs – which is more important – and coincidentally scoring runs is Ichiro Suzuki’s game. Slugging percentage also tends to be 75 to 100 points higher, on average, than on base percentage. Suzuki is unfairly penalized by this statistic as a result, as are all players who’s game is to get on base and create run-scoring opportunities. Unlike useless UZR, OPS does have value but care needs to be taken not to place too much emphasis on the stat on its own as it does not judge all types of players equally.
Wins Above Replacement (WAR)
WAR is a single number used to present the number of wins a player added to a team as opposed to a mythical “replacement player” – think a AAA call-up of major league journeyman. Firstly, there is no consistent formula established to calculate WAR, though Fangraphs, Baseball Reference and Baseball Prospectus’ versions are the most popular ones. Although this isn’t really a concern of mine, the concern I do have is with how the stat is calculated. Since it uses UZR, which needs 3 years of data for a large enough sample size, WAR, by definition, is inaccurate.
Secondly, the way WAR is calculated for pitchers and positional players is far different yet many compare a positional player’s WAR to a pitcher’s WAR as if they were equal. They are not. This is a flaw.
Lastly, the problem I have with WAR is semantics – I’m not comfortable with the fact that “wins” can be assigned to an individual player in a team game such as baseball.
BABIP
Part of most everybody’s method of hitter evaluation includes a look at the player’s BABIP. The acronym stands for Batting Average on Balls In Play – a statistic that tries to answer whether a hitter is lucky or unlucky. Players with a BABIP lower than the league average ( ~.300) are considered to be unlucky and should theoretically expect their performance to improve. Those who have a BABIP higher than the league average should reasonably expect their performance to come back down to earth, or so the prevailing wisdom goes.
However, some hitters are more dependant on BABIP than others. Remember, the stat measures balls in play, so a fly-ball hitting slugger will hit less balls in play and not be subject to the statistic as much as a slap hitter like an Ichiro Suzuki or… Wee Willie Keeler. I would argue that placing balls where players are not positioned is an actual skill and not a by-product of luck. That’s why this statistic makes me uncomfortable.
Skill-Interactive Earned Run Average (SIERA)
Skill-Interactive Earned Run Average estimates ERA through walk rate, strikeout rate and ground ball rate, eliminating the effects of park, defense and luck. How is it calculated you ask?
SIERA = 6.145 – 16.986*(SO/PA) + 11.434*(BB/PA) – 1.858*((GB-FB-PU)/PA) + 7.653*((SO/PA)^2) +/– 6.664*(((GB-FB-PU)/PA)^2) + 10.130*(SO/PA)*((GB-FB-PU)/PA) – 5.195*(BB/PA)*((GB-FB-PU)/PA)
The statistic is so convoluted that Baseball Prospectus removed SIERA from their toolbox of stats. That’s good enough for me.
All of the statistics listed above make me uncomfortable when used on their own. When analyzing a player I like to couple conventional statistics with sabermetric ones. A stat like OPS will not provide total clarity, however, when looked at alongside conventional stats such as hits, home runs, bases-on-balls and stolen bases, a clearer picture of a player’s production begins to form. When looking at ERA I’ll also check out FIP while also considering innings pitched, strikeouts, GB/FB ratio, K:BB ratio, hits, home runs, walks and intentional walks. Range factor is better when paired with a statistic like fielding percentage. While a player may be able to get to balls on the periphery, is he able to convert them into outs?
Ideally, what I want from my statistics are black and white outcomes. That’s what I love about OBP – either the player (edit) made an out ot he didn’t. There’s no sample size error, no assumption of universal causation, no convoluted formula. Putting statistics into a greater context allows us to fully flesh out what a player brings to the table. I recognize that the gripes I present about the stats listed above may be perceived by some as minor. As I said before, I realize that the stats I am uncomfortable with still have value, however, I want accuracy from my stats – not flakiness. It is when these stats are viewed as the “be all and end all” that makes me uncomfortable.
Vin Scully said it best when it comes to evaluating players using statistics:
Until the day Sabermetrics can give me 100% accuracy, I will continue to be uncomfortable with how they are used on their own.
Is my thinking flawed? Do I have a misunderstanding of one or more sabermetric stats? Leave a note in the comment section to try to convince me otherwise.
Images are courtesy of screencaps of the Fox classic sitcom, “The Simpsons, ” Keeping Our Finger on the Pulse, Brandon Julien, and the Associated Press.
Related
Dominican Baseball
You may also like
Live Blog: Jays vs. Tigers
The top five Toronto Blue Jays players at every position
Venezuelan Baseball in Jeopardy