Saturday, February 11th, 2012

Butterflies & Dr. Statlove (or How They Learned to Stop Watching Baseball and Love the Numbers)

8

Posted by Brian Joseph on Saturday, June 20, 2009 at 7:55 pm

Sometimes I think there are baseball fans of the sabermetric sort that would rather watch FanGraphs’ Live Scoreboard than actually watch a game of baseball. This isn’t a knock on how people choose to enjoy the National Pastime, just an observation. Heck, having seen Adam Eaton pitch more times than I care to remember, there have been times I wish I wasn’t actually watching the game.

Those who love the numbers of the game are often refer to sabermetrics and almost treat as a way of life when discussing how they choose to enjoy the game. Wikipedia defines sabermetrics as the analysis of baseball through objective evidence, especially baseball statistics. While this is a simplified definition, I always found the definition ironic. The notion that sabermetrics is truly objective is silly when there are a number of ways to “objectively” look at a situation statistically depending on your subjectiveness toward the game. Take player value, for example. Some prefer VORP, others look at WAR and others consider Win Shares. Each serves a purpose and each way to evaluate players has its following and detractors. So, it is truly not objective.

Recently, a few discussions and happenings in baseball compelled me to write this. Most of them are statistical in nature but are also impacted by actually watching the game and realizing that while statistical analysis is vital to understanding the game, the study is interdependent on what is seen on the field. Let’s walk through four of those rather quickly:

Messing With The Johan

Last week, Johan Santana had the worst outing of his career. The Mets were pounded by the Yankees and most of the damage was done on Santana’s watch with Johan allowing nine earned runs in just three innings.

The impact? Santana went from a 2.39 ERA to a 3.29 ERA. He also went from 8-3 to 8-4 in the W-L column (even though we’ve already been told by many that W-L records are meaningless, right?).

At this point in the season, Santana would have to throw 31 scoreless innings to return his ERA to that impressive 2.39 number. So, even if Santana threw three consecutive complete game shutouts, his ERA would still not be as good as it was before he allowed nine earned runs in three innings. Does that make sense? Statistically, it does. Hence, if you evaluate Santana based on his ERA alone, even though Santana could go 3-1 with three complete game shutouts in a four game stretch, because of how poorly he performed in that one outing, we’d have to assume that he was a better pitcher ERA-wise four starts ago.

When actually looking at the game itself, there was a spot in the game that Santana’s reputation may have cost him one or two earned runs which were rather meaningless in a 15-0 loss but meaningful when looking at Santana’s ERA.

In the fourth inning, after Santana allowed a two-run homer, double and single, Derek Jeter stepped to the plate. With the score 6-0 at the time and two runners in scoring position, many lesser pitchers would have been removed at that point. Instead, Santana stayed in, Jeter singled which scored a run and Johan was removed. Both of those runners scored off of the Mets’ bullpen (and many more) and instead of a three inning outing with seven or eight earned runs, Santana allowed nine. In the grand scheme of things, on that day, did it really matter if he gave up seven, eight or nine earned runs? The outing was awful no matter how you look at it.

Since it is fresh in our minds, it is easy to consider these factors. However, when we look at Santana’s season three years from now, it is doubtful this outing will be considered when evaluating Santana’s overall performance. But it does have a huge impact. If Santana allowed eight earned runs, it would have taken 27 scoreless innings to restore his ERA to 2.39 and seven earned runs would have been even more forgiving at 23 scoreless innings to restore his ERA to 2.39. Again, on that day, did it really matter if Santana allowed seven earned runs or nine? Only statistically.

Wieters, Jesus and PECOTA

Another particularly useful function of sabermetrics is in the realm of projections. Often recognized as the best of the projecting bunch is Nate Silver’s PECOTA found at Baseball Prospectus.

Personally, I’m a fan of BP’s annual even though I don’t completely agree with all of the PECOTA projections (but who can say they do when there are thousands of projections made about the 2009 season alone?).

Sometimes, PECOTA is frighteningly close (see last year’s projections for Evan Longoria) and sometimes not so much (there’s a good chance that Zack Greinke will outperform his PECOTA projections in 2009 which have him at 12-10 with a 3.96 ERA.).

Then there’s the Matt Wieters 2009 PECOTA projection.

In case you lived under a rock, the projections on Wieters after what was considered the greatest minor league performance of the last four decades (according to the annual’s write-up on “Orange Jesus”) were off the charts! How off the charts? 31 homers, 102 RBI, 105 runs scored and a batting line of .311/.395/.544! No wonder some were surprised that Wieters didn’t levitate from the on-deck circle to the batter’s box when he made his Orioles debut.

Now, I’m not here to criticize PECOTA or BP or projections, in general. My question is how could people actually believe that such feats were feasible from a 23-year-old catcher who had not played above Double-A ball? Considering the fact that the projection also called for Wieters to notch 649 Major League plate appearances in 2009, it was easy for me to realize that while this projection should make me excited about what Wieters will eventually do in the Majors, I shouldn’t expect this much production out of Wieters.

Why is that? For one thing, only one catcher (Russell Martin) reached 650 plate appearances in 2008. Also, the Orioles spent some money on a capable backup catcher in Gregg Zaun. Plus, it was made pretty clear that Wieters would not open the season on the Major League roster due to the way Major League roster rules and free agency works.

So far, Wieters has one homer in 61 plate appearances and a batting line of .259/.295/.414. Obviously, in this small sample, he is nowhere close to his PECOTA projections… based on last year’s 530 plate appearances in the minors. It is worthy to note that Wieters projections would obviously have been altered by his 163 plate appearances at Triple-A Norfolk and likely brought the earlier projections down a bit.

You can’t fault BP for putting out the PECOTA projection. The numbers are what they are. You take a fairly consistent projection system and these are the numbers revealed by it, what else can you do?

What you can fault is those who took the projections too literal. Inevitably, Wieters will likely not live up to those projections and some in the hyper-critical sabermetric cult… err… community will bash PECOTA for the miss. Another one of the projections out there on Wieters will be closer and that group will wear that projection as a badge of honor and use it as their calling card as to why their numbers cruncher is better than yours.

Whether it is CHONE, Marcel, ZiPS, Oliver or whatever projections are floating out there, there are always going to be black eyes that need to be covered up by the spot on projections that act as make-up for their blemishes.

While none of the Wieters projections were as grand as PECOTA’s call, they were all pretty clear that we should expect Wieters to be a special player. Now, we’ll have to see if the “Orange Jesus” monicker is sarcasm or honorable.

Butterflies and Brad Lidge

Finally, there’s the matter of Brad Lidge’s performance this year and his overall effect on the Phillies. Statistically speaking, before Lidge went on to the disabled list, there was speculation about whether or not the Phillies should stick with him in the closer role and either force him to cop to a phantom injury or switch roles with excellent set-up man Ryan Madson.

The theory, independent of many non-statistical variables, was that Madson was performing well statistically and Lidge was not, so swap the two or take Lidge out of the equation all together and the conclusion will be a better Phillies team.

The point became somewhat moot when Lidge hit the DL with an actual injury but the results show that statistics alone cannot tell us everything.

First of all, so far Madson has not been as effective as the closer as he was as the set-up man. In simple terms, the switch from Lidge to Madson has been, at best, a wash.

The butterfly effect of the move reached further than just the switch of Madson from set-up to closer. The move essentially affected every member of the bullpen as the roles throughout the bullpen had to be shuffled to accommodate the switch. Without getting caught up in the numbers, the bullpen in Lidge’s absence, as a whole, have been ineffective independent of Madson’s performances. While Lidge was pretty much awful as the closer until he hit the disabled list, the bullpen was very effective.

While the cause and effect of bullpen roles has been hotly debated for many years, in this specific instance, it obviously did not benefit the Phillies to have their relievers switching roles.

It’s Chaos I Tell You!

What it boils down to is that baseball is less tied to statistics and more tied to chaos theory. While statistics obviously play a major role in our understanding of the game, understanding the impact of chaotic behavior on the game is just as important.

Because of the complexities of the game, no statistic is unaffected by its environment.

Consider ballpark effect and its usage to attempt to level the playing field when studying player performance. It often comes into play when talking about players who have impressive offensive seasons in hitter-friendly parks. Matt Holliday is the first name that comes to mind but any home run hitter on the Phillies would suffice, too. However, to understand the deficiencies of park effect, one need only look at the way it is calculated and the fact that the numbers are interdependent by who is playing and pitching at that park in a given year.

Did I lose you? Let me simplify…

Let’s use Roy Halladay as an example. We all agree that Halladay is an elite pitcher, right? If not, stop reading and go watch video of him pitching and then come back to this. Welcome back! Now we are at agreement, right? Good.

Last year, Rogers Centre favored pitchers slightly. Those numbers are based on 81 games in Toronto of which Halladay was the starting pitcher for 15 of them. No other stadium was impacted as much by Halladay being a part of the equation as the Rogers Centre was. While the park effect is often considered to be a great way to measure the impact of a stadium on the game, it is impossible to truly measure such a thing without recognizing that it is inherently flawed by the players and pitchers that impact the numbers put up in said stadium. The park and the player are interdependent, no matter how you slice it.

From the way the infield is cut to the way an opposing pitcher tears up a mound to his liking to equipment choice to umpire to what that player had for breakfast, lunch or dinner could all have an effect on what happens on the field.

Simply put, it’s not all statistical. If you believe it is, you sometimes end up making silly arguments and defending them with such attitude that turns the casual fan off and keep many from seeing it as anything more than a “cult” rather than a tool to better understand the game.

It also leads people to make statements like “Placido Polanco, Mr. Underrated” and then basing their “statistical” evidence on RAR which is basically a rating. (And ironic that a Tiger would be good when it comes to RAR!)

I know, I know. I’m hard on those who love sabermetrics. My guess is while I love the numbers of the game, I will never be truly accepted in the sabermetric fraternity. But, at the end of the day, you can’t understand baseball just by looking at the numbers. The statistics of the game are too malleable to make an iron-clad complex argument without someone else manipulating the numbers slightly to fit their hypothesis. And no matter how snarky you are in your commentary or how sure you are in your conclusion, there’s another way to look at it.

It’s chaos theory, at it’s best. Too bad it sometimes brings out the worst.

Share

Comments

8 Responses to “Butterflies & Dr. Statlove (or How They Learned to Stop Watching Baseball and Love the Numbers)”
  1. ShigFace says:

    Just a word of advice: You should attempt to understand something before you bash it.

  2. Brian Joseph says:

    I’m always looking to understand things better. If I’m wrong about something specifically, I always welcome constructive criticism.

    Anything specifically you feel I’m missing the boat on here or is it just the fact that I offered some fundamental criticism of sabermetrics in general?

  3. JeffK says:

    The notion that sabermetrics is truly objective is silly when there are a number of ways to “objectively” look at a situation statistically depending on your subjectiveness toward the game. Take player value, for example. Some prefer VORP, others look at WAR and others consider Win Shares. Each serves a purpose and each way to evaluate players has its following and detractors. So, it is truly not objective.

    Let’s just start with this right here. This is a flat-out ignorant statement. Whether or not it is intentionally so, I have no idea. As it is for all intents and purposes a thesis statement for your article, its inclusion shows a lack of understanding without resorting to the same passive-aggressive defense of sabermetrics that you employ in deriding it.

    Science claims to be objective, as well. In much the same way, as there is a method (the scientific method, which may or may not surprise you to learn) that outlines procedures and avenues for offering proof of claims and providing numbers and data so that others can reproduce and test results. The concept of mass, and hence ‘weight’ is a result of that method. If I ask you “What do you weigh?”, you may or may not have an immediate response, just like if I ask someone next to me at a game “What is Matt Wieter’s value this season?” And yet, just like the question of his value has a number of different answers depending on the assumptions made (Which method are we using?), so does the answer of your weight. The answer differs depending on what the level of gravity is. Is your claim that science is not objective because there are multiple answers to the question?

    If so, you’re an idiot who can be dismissed on face. If not, then you’re simply someone who didn’t think through your claim, your thesis, that sabermetrics cannot be objective because it can objectively provide different answers to the same question. Regardless, you should attempt to understand something before you bash it. That’ll do for now.

  4. Brian Joseph says:

    I think somewhere in my writing I talked about those who discuss/enjoy/focus on sabermetrics find a way to turn the casual fan off by the way they make their points. I don’t think I did a good job of illustrating that because I was trying to keep word count down.

    So, thank you, JeffK, by calling me ignorant and an idiot… that pretty much makes my point about that.

    I do understand sabermetrics… I may have not made myself clear in my writing on my stance because I find most of sabermetrics very useful. Part of my disdain for those who use/analyze statistics is that typically they take their amazing piece of filet mignon and slap it on the top of a trash can lid and serve it to me for dinner and then tell me I’m a moron if I don’t like their choice of service.

    Comparing value to weight is an interesting way to look at science vs. sabermetrics… however, I would contend that if you get specific weight becomes more objective while you still have some work to do when it comes to value in baseball due to it being more complex. If you want to argue that batting average is pretty objective (since it is a percentage of how often a player gets a hit in a given period of time) then I’d have to agree. But value is so difficult because there are so many variables.

    Back to the weight v. player value argument… if I specify parameters, it is easier for me to make weight objective than it is with player value:

    “What is your weight on Earth?” (Pretty much the same and kilograms and pounds have a mathematical conversion to equal each other so that doesn’t count as a variant in the answer.)

    “What is Matt Wieter’s Major League level value?” (Does your valuation factor in defense? Does it overvalue defense? Does it favor OPS, is it altered by Park Factor? (which I would then ask if we considered the complexities of Park Factor))

    Anyway, I don’t think you are understanding my argument. I’m not deriding sabermetrics, I’m deriding those who think we can get true understanding of the game solely by sabermetrics and that the current accepted ratings may be flawed in certain ways.

    But I am completely ok with you bashing it without attempting to understand my point. That’s your choice.

    Thanks for your comments and criticisms, I appreciate the discussion.

  5. ShigFace says:

    I’m always looking to understand things better. If I’m wrong about something specifically, I always welcome “constructive criticism.”

    –It is pretty obvious by your silly attempt to take down park factors that you do not have the faintest clue about how they actually work and how they are actually used. Since you don’t know or understand park factors, it is completely pointless to attempt to bash it. There has been a lot of work and research done on park factors by very smart people who love to watch baseball, and to brush it aside based on an inaccurate characterization of how you think they are (as if no-one had ever thought of it before,) is both ignorant and arrogant.

  6. Brian Joseph says:

    Shig,

    I do happen to have a bit of a clue on Park Factors… just because you don’t agree doesn’t mean I don’t have a clue… I know, I know, that is a sabermetric axiom that is consistently prevalent in sabermetric arguments but it is just not true… I’ve read the Park Factors description on Baseball-Reference and have toyed with ideas on how to improve on it so I’m not completely ignorant to it… I just choose to not accept it wholely as the end-all, be-all authority on the matter… although I’d say it is the best thing we have at the moment.

    I used Roy Halladay for a reason and actually, he can be used to argue against me because he actually threw 132-1/3 innings on the road and 113-1/3 innings at home so he actually impacted the Blue Jays more on the road than he did at home so you could have easily disproved my statement with that little fact… however, you chose to call me names. That’s ok, too… just doesn’t do much for your argument. Except show off your Internet muscles. (Tank top or sleeveless?)

    Anyway, I used Halladay for two reasons, one the obviousness of him being a dominant pitcher independent of the ballpark he is in… the other being, well, you’re the sabermetrician, you tell me… :)

    Thanks for the discussion… much appreciated!

  7. Rob McQuown says:

    I think people need to be less antagonistic when ideas are presented, instead of dismissing them out-of-hand. I’ve spent much time discussing baseball topics with Brian, and know that he has given more thought to these topics than most people realize. Stating that some of the accepted norms are overly-simplified models is an entirely reasonable and thought-provoking position. Sometimes a depth of understanding is indicated by an appreciation for what we don’t know yet.

    For example, on the topic of “park factors”, a few years ago, I posted about one way in which park factors could be improved, and since that time, it’s found its way into usage in some calculations of such factors. And I think there are many further advancements in that one tiny area which are available. Some thoughts:
    - I’d suggested that the Beltre/Sexson signings by Seattle years ago weren’t great for their park, but multiple M’s bloggers told me that those two guys hit their homers in such a manner that they go out wherever they are. Such “hit location” charts alter any standard “park factor”.
    - In chatting with Brian, it occurred to me that there are imbalances in Home/Road. I personally think his example here about Halladay doesn’t show anything, but what about Nolan Ryan’s Rangers years. Most years, he saw 50% more IP at home, with some being just silly (132 IP at home, 41 on the road in ‘91, seriously) That had to influence the Texas “park factors” to some extent!
    - I know managers rarely adjust their starting rotation, but they will adjust their lineups and reliever usage for ballparks… so perhaps a park like Petco really has a different impact than the amalgamated numbers suggest, as managers are going to tend to tailor their lineups to the park, at least inasmuch as being more likely to use flyball pitchers and to give power hitters their off days in that park. A personnel-adjusted set of park factors would be fun to see.

    There’s a fun, exciting world of new innovations being made. While it’s certainly easier to dismiss anything that seems different as “idiotic”, it isn’t about ease.

  8. Colin Wyers says:

    Rob, it would greatly help the cause of being “less antagonistic” if Brian would lay off the cliched and needless personal insults and stereotyping at the beginning of this article. If he wants his ideas taken seriously it would be better to not start off tarring “statheads” as people who don’t watch baseball games and as a cult. Brian set the tone to begin with, so you’ll excuse me if he later complaints about the tone of the responses do not excite in me much sympathy.

    As for the park factors – if he seriously did understand the issues as expressed in his followup comment, then it was either reckless or dishonest or both to phrase his criticism of them the way he did. As he acknowelges himself, Halladay’s impact on his home park’s RPG is offset by his impact on his team’s road RPG. If you want to argue that there are differences in playing time between a player on the road and at home, that’s fine, but that’s not what the original article said, and I don’t see how it’s unfair to call it an ignorant criticism in that it ignores that fact. (Ignoring the fact even after you are aware of it is not, by the way, a defense. It is in fact the refuge of the worst cheats and scoundrels – to make a point by ignoring the facts inconvenient to the point you are making.)

    As for Wieters, simply Googling “wieters projection” would lead you to this critique (full disclosure: I wrote it) of that projection, from a “cultist” in Brian’s argot, written before the start of the season. Far from being an arguement against the sabermetric community, the Wieters projection is a moment where the community actually went out, questioned something and found a different answer. If you want to know why the Wieters projection, as-is, was published in that form, don’t use it as an excuse to blast those of us who are in the analytic community.

    As a company owned by Baseball Prospectus, it seems to me the height of insult to mock the sabermetric community for the Wieters projection, when we’ve known damn well for a while now that it was full of hot air. Why don’t you instead direct your critical energies at your bosses who have yet to make a correction to the projection in the face of the outcry from the sabermetric community?

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!

You must be logged in to post a comment.