The Use Of Sabermetric Stats Makes Me Uncomfortable

 

I admit that the title of this post is a bit of a misnomer.  Some – not all – sabermetric stats are being used/interpreted in a way that makes me uncomfortable.  Let me explain.


“There’s a man in Mobile who remembers that Honus Wagner hit a triple in Pittsburgh 46 years ago. That’s baseball. So is the scout reporting that a 16-year-old pitcher in Cheyenne is a coming Walter Johnson. Baseball is a spirited race of man against man, reflex against reflex. A game of inches. Every skill is measured. Every heroic, every failing is seen and cheered, or booed. And then it becomes a statistic.” – Ernie Harwell

 

For the casual fan who may not know the background of sabermetric statistics, allow me to give you a quick background.  Sabermetrics is derived from the acronym SABR, which stands for the Society of American Baseball Research.  It was created by Bill James.  From Joe Posnanski’s What Keeps the Game Great:

(Bill James) began his quixotic writer’s life in the 1970s as a security guard at the Stokely-Van Camp cannery in Lawrence, Kans. (I’ve often imagined that Bill protected the pork from the beans.) There he would pore over box scores clipped out of The Sporting News and try to figure out baseball. He was intensely interested in what was real about the game.

Bill James defined sabermetrics as “the search for objective knowledge about baseball.”  Sabermetrics is the analysis of baseball through objective, empirical evidence, especially baseball statistics that measure in-game activity.  Thanks to James’ influence, front offices and fans alike now view the game of baseball differently.  To me, the biggest effect that James has had on the game of baseball was minimizing the importance of batting average and maximizing the importance of on-base percentage.  It’s not as important how many times a hitter reaches base on a batted ball in play as much as it is whether he makes an out or not.

When it comes to defense – before sabermetrics arrived on the scene – a player’s defense was evaluated by a statistic known as fielding percentage (put outs + assists / total number of chances).  In other words, a player was deemed to be a defensive stud if he made the least amount of errors. James recognized that even if one were to place a pillar of salt at shortstop, that pillar of salt would not make many errors – yet it wouldn’t make any plays either.  The real defensive stud was the player who could get to the balls that other players couldn’t.  Range factor (putouts + assists / # of innings at a position) and Ultimate Zone Rating (UZR) were created to attempt to evaluate these competencies. 

I don’t ever like to play the “I played the game so I know” card, because not having played the game doesn’t preclude one from being able to understand and critique it.  I also don’t like to be painted with the “old-school dinosaur” brush, since I believe that sabermetric statistics do have value and I am all for continuous improvement.  That said, I come from the game-playing perspective and I can tell you there is a lot more going on than the number of groundballs hit to a player divided by number of innings.  Great progress has been made thanks to the efforts of Bill James, such as recognizing that  RBI and Save stats are useless.  But some sabermetric stats make me uncomfortable.  They make me uncomfortable because often they are used on their own and perceived to be the final word in the assessment of a player despite being flawed.  These are them:

Range Factor

Range Factor (commonly abbreviated as RF) is a baseball statistic developed by Bill James. It is calculated by dividing putouts and assists by number of innings or games played at a given defense position. The statistic is premised on the notion that the total number of outs that a player participates in is more relevant in evaluating his defensive play than the percentage of cleanly handled chances as calculated by the conventional statistic fielding percentage.

– Baseball Reference

I wholeheartedly agree with the fact that the total number of outs that a player participates in is more relevant in evaluating his defense than the percentage of cleanly handled chances.  That said… it is important to recognize that it is not always up to the player to put himself in a position to participate in an out.

You can spend your time poring over boxscores all you like, but they don’t always tell the whole story.  For example, during my playing career I played for a coach who was obsessed with doubles defense and would often times employ this defensive strategy in situations that didn’t call for it in a traditional sense.  If you don’t know, doubles defense is when the outfielders position themselves deep so balls don’t go over their heads and the corner infielders play close to the line to prevent ground balls hit down the foul lines for doubles.  It is the coach/manager’s decision when to employ this strategy and he will instruct the players where to position themselves between plays.  As a pitcher, I would get hot under the collar as I would often times execute a sinking fastball to induce a ground ball and the ball would roll harmlessly through the infield – where the third baseman should have been positioned – for a base hit.  Because of the coach’s decision to employ doubles defense, the third baseman’s Range Factor is negatively affected.  Is it his fault? No.  The same argument can be made for teams that employ the shift defense on dead-pull hitters.  That is one flaw in this statistic.

Another flaw is that the statistic can be skewed based on the type of pitching staff a team employs.  A team that plays in a bandbox ballpark might choose to employ a groundball pitching staff.  As a result, the infield will have more chances to pad their stats with putouts and assists while outfielders will be unfairly penalized by having less chances.  Conversely, a cavernous ballpark like Petco Park will be able to get away with having a flyball pitching staff, therefore the outfielders will have more chances and the infielders will have less.  Although it is a far better statistic than fielding percentage, does Range Factor really tell the whole story of a player’s defensive prowess in relation to other players in the league? I don’t think so.

Ultimate Zone Rating (UZR, UZR/150)

UZR is an advanced defensive metric that uses play-by-play data recorded by Baseball Info Solutions (BIS) to estimate each fielder’s defensive contribution in theoretical runs above or below an average fielder at his position in that player’s league and year. Thus, a SS with a UZR of zero is exactly average as compared to a SS in the same year and in the same league. If his UZR is plus, he is above average, and if it is minus, he is below average.

Fangraphs.com

For more info on UZR, click on the link above.  Apart from the uncomfortable fact that UZR relies on a computer to parse through 6 years of footage to determine whether a play made is reasonable or not, UZR has two notable flaws.  Firstly, UZR is subject to the same limitations as Range Factor: defensive positioning.  In addition to that limitation, UZR is a stat that is generally accepted to be effective only when analyzed in the context of a 3-year period due to sample size concerns.  A stat that needs 3 years of data to be effective is useless to me, mostly because player performance can fluctuate wildly from season to season.  For example, Alex Rodriguez of the New York Yankees.  A-Rod played the entire 2008 season at third base with a torn labrum in his hip which severely limited his range of motion… as well as range in the field.  In March of 2009 elected to have surgery on his hip to remedy the problem.  Following the surgery,  A-Rod talked about how he now had better range of motion in his hip than had his hip never been torn in the first place.

In this instance you have a player who plays at an above-average level of defense, becomes injured, plays at a low level of defense, has surgery, recovers and possibly plays at a higher level than before he got hurt – all over a 3 year span.  Is UZR going to be able to quantify how good of a defensive player A-Rod really is?  I don’t think so.

Another example that is applicable to this situation is when a player is learning a new position or even comes up from the minor leagues. For instance, I remember when Orlando Hudson was called up from the minor leagues.  Upon his arrival in Toronto, his defensive game was flashy, yet a little rough around the edges.  Under the tutelage of infield coach Brian Butterfield, Hudson’s defense improved by leaps and bounds.  Now let’s say I am looking at evaluating Hudson’s defense in his 4th year as a major leaguer and I am looking at UZR.  Does it really matter to me what his defense was like 3 years ago?  Is it representative of the player he is today? Not in the least.

As former professional baseball Morgan Ensberg says:

The term “Garbage in, Garbage out” is the most accurate description I can give.  If the sample used is garbage, then the answers won’t be accurate.

Sabermetrics requires accurate information or organizations may misinterpret the data.

In other words, in the cases of Range Factor and UZR, the underlying data that feeds the systems is subjective, not objective, and prone to varying kinds of bias.

OPS

When I first heard about OPS it became one of my favourite statistics.  OPS is the acronym for on-base percentage + slugging percentage (On-base Plus Slugging).  After looking at the league leaders in OPS (all of them being studs)  I hastily came to the conlusion that OPS was a sure-fire way to measure studliness.  To put it into perspective, here are baseball’s all-time leaders in career OPS:

1. Babe Ruth, 1.1638
2. Ted Williams, 1.1155
3. Lou Gehrig, 1.0798
4. Barry Bonds, 1.0512
5. Albert Pujols, 1.0511

Since then, OPS has quickly permeated baseball consciousness and has crept onto some ballpark scoreboards as a go-to stat when judging a player’s offensive performance.

It is generally accepted that for a player to be a stud, his OPS should be greater than .850 or .900.  I began to think about players who are studs without relying on the power game to see how their OPS levels shook out.  Namely, Ichiro Suzuki.

Suzuki, an MVP award winner, a Rookie-of-the-Year award winner and 10-time All-Star, has a career OPS of .793.  In fact, not once in his career has Ichiro ever sniffed an OPS of .900.  Not too impressive is it? Especially for a player, in my mind, who is a future Hall-of-Famer.  Admittedly, Ichiro’s production has taken a nosedive this season, however, a few years ago many GMs would jump at the chance to build their franchise around the Japanese star.  What’s the issue then?  The fault of OPS is that it weighs on-base average and slugging percentage equally, although on-base average correlates better with scoring runs – which is more important – and coincidentally scoring runs is Ichiro Suzuki’s game.  Slugging percentage also tends to be 75 to 100 points higher, on average, than on base percentage.  Suzuki is unfairly penalized by this statistic as a result, as are all players who’s game is to get on base and create run-scoring opportunities.  Unlike useless UZR, OPS does have value but care needs to be taken not to place too much emphasis on the stat on its own as it does not judge all types of players equally.

 Wins Above Replacement (WAR)

WAR is a single number used to present the number of wins a player added to a team as opposed to a mythical “replacement player” – think a AAA call-up of major league journeyman.  Firstly, there is no consistent formula established to calculate WAR, though Fangraphs, Baseball Reference and Baseball Prospectus’ versions are the most popular ones.  Although this isn’t really a concern of mine, the concern I do have is with how the stat is calculated.  Since it uses UZR, which needs 3  years of data for a large enough sample size, WAR, by definition, is inaccurate.

Secondly, the way WAR is calculated for pitchers and positional players is far different yet many compare a positional player’s WAR to a pitcher’s WAR as if they were equal.  They are not.  This is a flaw.

Lastly, the problem I have with WAR is semantics – I’m not comfortable with the fact that “wins” can be assigned to an individual player in a team game such as baseball.

BABIP

“Keep your eye clear, and hit ’em where they ain’t” – Wee Willie Keeler

Part of most everybody’s method of hitter evaluation includes a look at the player’s BABIP.  The acronym stands for Batting Average on Balls In Play – a statistic that tries to answer whether a hitter is lucky or unlucky.  Players with a BABIP lower than the league average ( ~.300) are considered to be unlucky  and should theoretically expect their performance to improve.  Those who have a BABIP higher than the league average should reasonably expect their performance to come back down to earth, or so the prevailing wisdom goes.

However, some hitters are more dependant on BABIP than others.  Remember, the stat measures balls in play, so a fly-ball hitting slugger will hit less balls in play and not be subject to the statistic as much as a slap hitter like an Ichiro Suzuki or… Wee Willie Keeler.  I would argue that placing balls where players are not positioned is an actual skill and not a by-product of luck.  That’s why this statistic makes me uncomfortable.

Skill-Interactive Earned Run Average (SIERA)

Skill-Interactive Earned Run Average estimates ERA through walk rate, strikeout rate and ground ball rate, eliminating the effects of park, defense and luck.  How is it calculated you ask?

SIERA = 6.145 – 16.986*(SO/PA) + 11.434*(BB/PA) – 1.858*((GB-FB-PU)/PA) + 7.653*((SO/PA)^2) +/– 6.664*(((GB-FB-PU)/PA)^2) + 10.130*(SO/PA)*((GB-FB-PU)/PA) – 5.195*(BB/PA)*((GB-FB-PU)/PA)

The statistic is so convoluted that Baseball Prospectus removed SIERA from their toolbox of stats.  That’s good enough for me.

All of the statistics listed above make me uncomfortable when used on their own.  When analyzing a player I like to couple conventional statistics with sabermetric ones.  A stat like OPS will not provide total clarity, however, when looked at alongside conventional stats such as hits, home runs, bases-on-balls and stolen bases, a clearer picture of a player’s production begins to form.  When looking at ERA I’ll also check out FIP while also considering innings pitched, strikeouts, GB/FB ratio, K:BB ratio, hits, home runs, walks and intentional walks.  Range factor is better when paired with a statistic  like fielding percentage.  While a player may be able to get to balls on the periphery, is he able to convert them into outs?

Ideally, what I want from my statistics are black and white outcomes.  That’s what I love about OBP – either the player  (edit) made an out ot he didn’t.  There’s no sample size error, no assumption of universal causation, no convoluted formula.  Putting statistics into a greater context allows us to fully flesh out what a player brings to the table. I recognize that the gripes I present about the stats listed above may be perceived by some as minor.  As I said before, I realize that the stats I am uncomfortable with still have value, however, I want accuracy from my stats – not flakiness.    It is when these stats are viewed as the “be all and end all” that makes me uncomfortable. 

Vin Scully said it best when it comes to evaluating players using statistics:

Statistics are used much like a drunk uses a lamppost: for support, not illumination.

– Vin Scully

Until the day Sabermetrics can give me 100% accuracy, I will continue to be uncomfortable with how they are used on their own.

Is my thinking flawed?  Do I have a misunderstanding of one or more sabermetric stats?  Leave a note in the comment section to try to convince me otherwise.

 

Images are courtesy of screencaps of the Fox classic sitcom, “The Simpsons, ” Keeping Our Finger on the Pulse, Brandon Julien, and the Associated Press.

Written By

has written for Mopupduty.com since 2006. Follow Callum on Twitter, LinkedIn and Instagram (@callumhughson)

  • An interesting discussion surrounding this post can be found at Baseball Think Factory here: http://baseballthinkfactory.org/files/newsstand/discussion/hughson_the_use_of_sabermetric_stats_makes_me_uncomfortable/

  • Early

    Nice article. Whoever said that SABR stats should stand alone? There are Triple Crowns for a reason. I love OPS but of course there are studly outliers. I would build my team around Pujols before Ichiro because of the power. I too like WAR and have to suspend disbeleif in determining how many wins a player is worth.

    • No “one” person decreed that SABR stats should stand alone. However, many times I will look at player comparisons and see the following argument: “Player A’s OPS is .750. Player B’s OPS is .800. Therefore player B is a better player.”

      Or “Yunel Escobar’s Range Factor is 5. Alex Gonzalez’ is 4. Yunel is the better defender.”

      The perception is that because the stat is an advanced metric (or the flavour of the week) it carries more weight and is free from limitations. I wanted to point out that they do have limitations which are not obvious on the surface.

      I see it in blog posts, I see it on Twitter, I even see it in the mainstream media and that makes me uncomfortable. Of course, if you believe that these arguments are a figment of my imagination, then I am railing against a completely fictitious straw man. If you agree with me and see this happen yourself, then my concerns are valid. It’s really up to you.

      • Early

        I remember years ago when this site started trying to concieve SABR stats that would go on the back of a baseball cards. OPS is the only one. WHIP maybe (is it a SABR?). OPS and WHIP are popular because they use traditional stats to make a metric stat. And it is easy to explain what is good and what is bad like trsaditional BA Homers ERA etc. If one stat can’t aptly be used to compare player A vs B what would the Triple Crown of SABR be to determine greatness of any type of player?

        • I put your Triple Crown question out to Twitter. These were the popular responses:

          1) AVG/SECA/EqBRR

          2) wOBA/HR/WAR (even though I wouldn’t classify HR as a sabermetric stat)

          3) wOBA/wRC+/OPS+

          I’d be inclined to side with the first 2 in #3. Not sure what my final stat would be.

    • I don’t need to see who “said” Saber stats should stand alone. I’ve seen plenty of comments at McCovey Chronicles and elsewhere with WAR being used as a blunt object. Player X has 3.1 WAR and Player Y has 2.5 so GM Z really screwed up. I don’t think it would add to the discussion to call out LouBrockLesnar69 for his silly usage of WAR.

  • Mrwalkerb

    I appreciate what you’re aiming for in this piece but the idea of faulting Sabermetric stats for not being perfect either implies that they claim to be perfect or that other stats are perfect. I totally dig OPS and WAR but they certainly cannot be held as holy grails for the reasons you point her and several others. That being said I hope you’d concede that they are better than older more inaccurate statistics like RBIs and pitcher wins. The one stat that people point to I think is horribly flawed is Isolated Power. Someone like Willy Mo Pena can have an enormous Isolated Power rating but only because he either strikes out or goes yard. This stat would be great when combined with OBP or another stat but in isolation is just brutes. I’m kinda rambling now so I should stop…

    • Yes, I certainly concede that RBIs are useless and pitching wins have become far less important in the last 30 years as the game has changed.

      Good point regarding ISO.

  • Darrell

    What is your feeling on wOBA? Seems like it takes out the slugging advantage that OPS illustrates.

    • I like wOBA a lot. I can’t find much fault with it.

  • SABRMetrics make you uncomfortable when used in isolation? When has one single metric ever been used by a statistician performing a study bs writing an article? Mist use these to augment the headline stats vs replace. And these metrics will always have errors of interdependence. That is why you wouldn’t call them metrics but indicators. They measure things which are organic, and in that respect they can will always have outliers. In many cases it is the outlier cases which are more interesting than .300 25 110 hitting 3rd playing first base. I wanna know why the guy with 32-15 sb/cs and a .657 ops is given more chances to hit than everybody else? SABR has made that point for 25 years. The answer is because some other guy was looking at one number and hoping about the other.