2016 Baseball Hall Of Fame ballot analysis

Time again to see what the Baseball Hall of Fame ballot is about for the new year.   Need I remind you that I do like to judge based on the stats more than character.  The fact is, there are a lot of cheats and malcontents that are already voted in.   Pete Rose should belong for his true hustle and character on the field and not for his off-field character.  Apparently, he did make some in-game decisions that affected his own gambling, and he did bet with his own team in mind.  So, that’s reason enough for me to ignore him.  But you know something? He’s already in the Hall. The Museum part, anyways.   Jesse Spector’s good column in The Sporting News explains. 
But is this enough? Does he deserve to be voted with recent and longtime veterans for a plaque to go along with that?  To get lauded by thousands at Cooperstown with current Hall of Famers? To make a speech?  We have to believe that the powers that be at MLB and the BHOF&M do not want this spectacle. I don’t want Pete to be put on that pedestal. It’s enough that his positive contributions are already in the Museum.
As for steroids, I say there was already no precedent set before these players came along to do so.  There been bulked-up players, sign stealing mechanisms, all sorts of shenanigans going on forever in this game.  You always try to gain an edge, even a perceived one.   Ultimately, I say, put them all in the Hall, or at least on the ballot, with the exception who have been strongly alleged or found to have deliberately hurt the spirit of the game.   Bill Pennington’s story for the New York Times explains the infamousness of the HOF:
Too late to take down the plaques of lauded individual ballplayers. It should be enough that enough players have a place in the Museum itself. Not just the HOF’ers, either, but those who owned the moment, one moment, that changed the game.

WIth that typed, I’ll present my reasoning for the players I selected in two forms.
Firstly, I drew from 10 categories pulled from www.baseball-reference.com
First 4 sections deal with leading the league and being in the top 10 of the league in various categories, plus 2 sections by Bill James presenting a de facto approach. I can remember first reading about this in his 1983 Baseball Abstract.
Next there are the WAR and JAWS stats, emphasizing wins added to a team and also strength in numbers by position.
Finally there are the number of years by age and career a player’s total output has resembled prior HOF players’ stats at the time.
The top 10 players with the most top ten appearances of each player across the 10 categories

category are those that I chose for my mythical ballot.  As with the actual ballot, it does not discriminate by position.
I post at the bottom of each category the ‘threshold’ number that it would take to be on the doorstep and also in the HOF.
I have no real issue with these 10.  Maybe Trevor Hoffman, but it’s because he’s had an outstanding RP career, and yet his WAR is so much less than others. It’s not for a lack of trying; he’s merely a specialist.  Specialization can’t be measured against players who are in the lineup all the time.    Griffey is a mortal lock, as is, yes, Bonds and Clemens.  Larry Walker continues to get overlooked, as does Sammy Sosa; both are quite deserving.  Lee Smith shouldn’t have been on the ballot this long; his numbers are weaker in comparison even to other relievers here.  Best of the rest along those I haven’t selected are Edgar Martinez and Alan Trammell.
It took 5 top 10 showings to make it into the HOF based on these numbers.  I had to separate Mike Piazza from Gary Sheffield. It was kinda close but here’s my reasoning:
Sheffield does have edge in terms of leading leagues in more categories. Piazza is very close in numbers in terms of both WAR and JAWS. The kicker was the similarity by age.  Piazza had 11 seasons in which the closest player in stats resembling his own at the time. All were catchers: Fisk at age 24, Dickey ages 29-37, Yogi 37-38.  Also Javy Lopez 25-28.
For Sheffield, he has just 2 seasons: Gary Carter at age 22, and Duke Snider at 34.  After age 30, his career closesly resembles Chipper Jones, plus Bagwell for 1 year.   Chipper is first eligible in 2018. If he gets in, that would make it easier for Sheffield for certain.   Does Chipper belong with this year’s ballot?  His Black and Grey Ink stats are too low. The HOF Monitor and Standards definitely fit in here.  His WAR and JAWS also are rather high.  7 similar batters to HOFers, but none by age. His top 10 at age 40 tho does provide great comparison.  To sum up, I’d put Chipper in, and Sheffield thereafter.
Here’s the attached spreadsheet for my standard mythical ballot:

Last  year I created an alternative set of categories, one that represents old and new stats that properly play into the fame quotient.  The main approach I used was to proportionally cast ballots by position based on the percentage of positions represented.  One sector for all offensive players, one for starting pitchers, one for relievers. Truly the real life balloting should be segmented this way, otherwise decent relievers have no chance against all-or-nothing super sluggers.  I’d truly want to see all positions have their own balloting. Hopefully that will make the Hall more fair for catchers and other positions that are often underrepresented.
The attached spreadsheet contains these categories:
Games/games played/finished: Playing into the longevity factor as a basis for fame.  Nomar is particularly a non-factor here while it helps the causes of Lee Smith, Curt Schilling and others.
WPA/LI bat/pitch stats: Best examples of # of wins given to team based on leverage index stat, measuring pressure involved to win said games.  Makes for a better case for Sheffield in this manner and much less o for Trammell.
MVP Shares are voted on by the BBWAA members, same ones as in the Hall.  The shares are a good accurate indicator of whom really belongs.  That helps out Hoffman for sure, and not so much Mussina or even Tim Raines.  Same case for CYA Shares, giving Hoffman an edge against other relievers at least.
Number of franchises: In this era of sports, the lower number of teams played for, the better.    That factore strengthens the case for Bagwell, Edgar Martinez, Trammell and McGwire. Not so for 8-team veteran Lee Smith or Sheffield.
K/BF, HR/PA. Power stats, simplified.  Billy Wagner averaged 3.01 K/BF, a strikeout an inning, considering every batter faced. Hampton was more of a control pitcher, explaining his 7.08 ratio.  The glamour stat of HR/PA certainly helps Piazza, McGwire (13.13!) and Sosa.
Postseason games played: Some love for Jim Edmonds, 64 games in all, 50 for McGriff, 49 for Kent.
My Fear Factor stat is the best marriage of power and contact accuracy, putting emphasis on the batter/pitcher conflict.   Interesting that Nomar leads all those on the ballot, with the usual suspects not too far behind.  The pitcher equivalent of the stat makes a better case for Billy the Kid than usual, as does Lee Smith. Schilling and Mussina seem weaker in comparison to their counterparts on the ballot.
I took the top 7 batters and watched for who fell in the top 10 in their categories the most, and added the top 2 SP, and the top reliever.
There is a definite consensus between the two ballots, with 7 players I’d green-light to Cooperstown.  6 more are question-marks, in on one but out on the other: Sheffield, Schilling, Hoffman, Raines, Walker, and McGwire.

2014 Super Bowl prediction (hard math)

I think I found the actual reference from my entry level statistics course in college. Something referred to as binomial probability; taking a trial of results with at least 2 outcomes, and measure of success and failure. I recognized this approach when I studied the Favorite Toy method by Bill James, which is now more properly termed The method is supposed to measure the probability of reaching a milestone amount, mainly applied to baseball. It’s a matter of taking 3 different numbers, and ranking them, applying the value to the number, and dividing the value of the rank to the sum of the numbers being used. For example:
We’ll assign 1 a value of 1, 4 a value of 2, and 3 a value of 3.
We multiply the numbers to each other (1×1), (4×2) (3×3) and then add these results.
So our total is
1 x 1 = 1
4 x 2 = 8
3 x 3 = 9
subtotal: 18
And we divide that number, 18, by the sub total of rank (1+2+3 = 6) to get…3
The number that is key is 3, as the likelihood of the result that follows 1, 4 and 3 based on the rank. This can be applied in a number of situations.
So it is that I looked to this method for determining a future scoring outcome for the Super Bowl, to see if I can predict the score and the flow of the game. What I did, after a few years experimenting on this, was to take the quarterly scores for and against of a team’s games,looking at each boxscore of the regular season and playoff game. Applying the Favorite Toy method, the Game 1 scores have the least rank, and the conference championship games the most rank. I had to add up the week numbers (basically 1 through 18, which equals 171), and then divide that by the series of products that resulted from multiplying all those quarterly scores. I then produced a 2nd series of numbers by simply doing a sum total of the quarterly scores, no binomial coefficient to mess with here. Finally, I did a median of the 2 sets of numbers to use as the basis for the comparatives.
For this post, I’m comparing the two Super Bowl teams and measuring who is scoring and allowing scores at a rate quicker or slower at a particular quarter of a game. In this way, not only am I attempting to predict a score, but also the very flow of the contest.
I’ll spare the heavy work, and give you the core results and the comparatives here:

After 18 games, here is what the Broncos scored based on the sub total:
1Q: 7.77 2Q: 9.72 3Q: 8.61 4Q: 10.88. Total of the sub totals are 36.98
Points allowed:
1Q: 3.55 2Q: 7.27 3Q: 5.77 4Q: 7.44 Total: 24.03

Using the binomial coefficient, the scores are a bit different. It’s how different that matters:
Points for:
1Q: 7.97 2Q: 9.38 3Q: 5.92 4Q: 10.04 Total: 33.31
Points against:
1Q: 3.02 2Q: 6.18 3Q: 5.21 4Q: 8.05. Total: 22.46

And for the median of the 2 results for the Broncos, the basis for the study:
Points for
1Q: 7.87 2Q: 9.55 3Q: 7.26 4Q: 10.46 Total: 35.14
Points against:
1Q: 3.28 2Q: 6.72 3Q: 5.49 4Q: 7.74 Total 23.23
When comparing the 1st 2 sets of numbers, I am comparing the statistical trend, the sub total to the binomial. The binomial result is what’s current. In this way, we see that the Broncos’ is scoring a few more points in the 1st, a few less in the 2nd, a lot fewer in the 3rd, and less in the 4th.
In points allowed they are allowing less in the first and 3rd, a good deal less in the 2nd, and then it softens up in the 4th.
The trend: I gathered by getting the net score of the sub total for each quarter, PF and PA, and comparing to the binomial. If the sub total is greater, or more positive, this means a negative trend is occuring.
1st quarter: +0.73
2nd quarter: +0.75
3rd quarter: -2.13
4th: -1.45
overall: -2.10
Now we pause and look at the Seahawks numbers. First the sub totals:
Points for:
1Q: 4.16 2Q: 8.50 3Q: 5.38 4Q: 7.33 Total 25.37
Points against:
1Q: 1.38 2Q: 6.38 3Q: 2.66 4Q: 4.16 Total 14.58

Binomial results:
Points for:
1Q: 4.33 2Q: 8.71 3Q: 4.67 4Q: 6.98 Total: 24.69
Points against:
1Q: 1.41 2Q: 5.63 3Q: 2.49 4Q: 4.75 Total: 14.29
And the median of these, the numbers we’ll run with:
Points for:
1Q: 4.24 2Q: 8.60 3Q: 5.02 4Q: 7.15 Total: 25.01
Points against:
1Q: 1.39 2Q: 6.00 3Q: 2.57 4Q: 4.45 Total: 14.41
1st quarter trend: +0.14
2nd: +0.96
3rd: -0.54
4th: -0.94
Overall: -0.38
In words:
Seahawks scoring a bit more lately in the first, a bit more in the 2nd, much less in the 3rd, and less in the 4th.
On defense, they give up a few more in the first, much less in the 2nd, a bit less in the 3rd, and soften up in the 4th.
Now for the real marriage: We’ll take the median of 2 of the median lines, specificaly Denver’s points for and Seattle’s points against, and then vice versa. We’ll do the median of those results further to finally get the point total we seek. And….we’ll do a final comparative as to how the scoring should go:

First the median of the Broncos’ points for and the Seahawks points against:
7.87 9.55 7.26 10.46 (Bronco offense)
1.39 6.00 2.57 4.45 (Seattle D)
4.63 7.77 4.91 7.45 = 24.06

Now vice versa:
4.24 8.60 5.02 7.15 (Seattle offense)
3.28 6.72 5.49 7.74 (Denver D)
3.76 7.66 5.25 7.44 = 24.11


Considering the quarterly trends and the marriage of medians, my predicted final score is Seattle 26, Denver.24

Both sides should come out firing and that momentum will be present, with Denver doing best, in the 2nd quarter. 2nd half will be much more defensive, and Seattle will have its biggest impact in the 3rd. Both sides are very even in the 4th, so the end-of-3rd quarter result should be the best indicator.

Baseball Hall of Fame and Museum ballots have been delivered to the BBWAA members (read: long-time sportswriters, many of whom have never played the game), and now the countdown to the inevitable question begins again. Who will get inducted next year?
I have slightly expanded my reach of variables, tho still including stats from the venerable website www.baseball-reference.com The 10 variables I use to evaluate all players are these: Black Ink, Grey Ink, HOF Standards, HOF Monitor, JAWS, JAWSpos, WAR, WAR7 (7 year peak score), most similarity scores by career stats, most similarity scores per age. Descriptions for these are in their website’s glossary; no need to repeat here.
The players with the most top 10 ranks are the ones I would vote for. If I have to break a tie, I’ll compare the players involved on a head-to-head basis.
I considered my own formula, the combo of Fear Factor and the Bill James stat % of Offensive Value (and its pitcher equivalent, which is my creation), but will leave this aside for now.
The attached spreadsheet reveals the research on the 36 players on the ballot.
I only counted the top 10 in each category. Here are the benchmarks that were created for each variable based on the ranks.
Black Ink: 100 to 23. Any player scoring under 23 were out of the top 10. Frank Thomas is out at 21. In at 23 is Don Mattingly.
Grey Ink: 336 to 157. In at 157 is Bagwell. Out at 138 is Sammy Sosa.
HOF Monitor: 336 to 171 (Schilling). McGwire is 170 and out.
HOF Standard: 76 to 54 (Mussina). Out at 52: Sosa, Glavine.
WAR: 162.5 to 71.5 (Palmeiro). Out at 70.3 is Trammell.
WAR7: 72.8 to 44.3(Glavine) Out at 43.7 is Sosa.
JAWS: 117.7 to 57.5(Trammell) Out at 55.9 is Edgar Martinez.
JAWSpos: 1 to 11. 11 players in this stat counted here. In at 11 are Edgar Martinez, Trammell, Palmeiro. Out at 14 are Biggio and Lee Smith.
Similarity score by career stats: 9 to 4. 11 players qualified. In at 4 are Raines, Mussina, Glavine. Out with 3 are Larry Walker, McGriff, Frank Thomas, Kent, Luis Gonzalez and Leon Durham.
Similarity by age ranged from 14 to 4 between 9 players and left the cutoff there. In at 4 seasons are Lee Smith and Frank Thomas. Out with 3 are Biggio, Walker, Glavine, Durham.

My vote would go to these 10

Frank Thomas
Larry Walker

First four out: Piazza, Raines, Trammell, Sosa.

In all, 21 of the 36 players received some top 10 consideration, and only 4 of the first timers at that.
I also looked at the players up for the early election via the Veterans Committee. I chose 7 players from the list and compared their numbers to this year’s standard ballot. Only Tommy John and Ted Simmons had any impact for comparison Joe Torre, who likely will be inducted as a manager anyway, got one top 10 entry, as did Dave Parker. Overall, none of those players deserve entry, but surely the executives and other figures will.

What I really want to study, beyond numbers, is see the actual ballots of actual writers and guess how they might vote next. That may be a Part 2 to this post.

Here’s the full spreadsheet

Introducing Pitcher Fear Factor into sabermetrics and similarities

The idea came while doing research and working between numerous open tabs on my desktop, mainly www.baseball-reference.com and www.fangraphs.com   I gave thought to the idea of runs created from the pitcher’s point of view.  Bill James was concerned with what the batter does at the plate against a pitcher and not what happens on the bases.  Along these lines, from the previous series of essays, I focus on the batter-pitcher matchup as the beginning point to everything that goes on in the game. Of course.

Having exhausted my work with batters, I ideally wanted to present the inverse properties of the variables involved. Does a pitcher appear stronger for giving up fewer hits and less homers? Is this pitcher more valuable to his team when comparing runs allowed to the potential runs created from that initial contact?  And, can I build a profile suggesting how a pitcher’s tendencies make him a true power or finesse pitcher, or otherwise exhude a Zen-like balanced quality?
My goal here is to find that Zen pitcher who can neutralize a batter, while
simultaneously keeping himself from his own tendency to be either a
gopher-ball or gopher-prone type.  We often refer to pitchers as either
power or finesse types, pointing to one’s K/9 ratio as a key stat.  I aim to
improve on the idea.
Further, more attention is given lately to a pitcher’s GB/FB ratio, which
changes very little over a career and may be a further indicator of success.
Elias Sports Bureau’s axiom bears out that when a fly-ball pitcher meets a
fly-ball hitter, and likewise for ground-ball types, the matchup tends to
favor the pitcher. When the styles are different, it favors the hitter. It’s
a truly modern improvement on the classic lefty-righty platoon adjustments.
But how often are these adjustments made based on the type of contact? And,
can we find the pitchers that take the matter of contact out of the equation
as much as possible, and keep the war at home plate?
From here, once we examine ground-ball and fly-ball types via the GB/FB
ratio, we can potentially determine initial outcomes for a ball game,
especially the first time through the order. We can see which pitchers might
have the most trouble or success against a hitter, and actually improve on
the common batter vs. pitcher history. It is this history that I’m using as
a testing ground to see if the Elias theory continues to hold water
What follows are the results of this research. I took the top 7 pitchers of
each MLB team as ranked by IP, including at least one reliever (either short
or long). I then set to work on the formula:

PFF (Pitcher Fear Factor, patent pending) = BFP -{ (PA – BB – K – HBP) /
PA } x (SLG/BA) (SLG, BA allowed, that is)
The first multiple takes into account every PA, and subtracting every result
that resulted in no contact by the batter to settle the issue. This can be
referred to as Opposition Contact Average.
The second multiple is the Opposition Bases per Hit, a simple division
equation here, expressed as a whole number, to 3 decimal places.
Multiplying these 2 formulas to make the product PFF, as with FF for
batters, we come to a number, expressed as a whole number, and to 5 decimal
places that is the adjusted amount of opposing bases per hit. It is the
number of bases per hit that a pitcher allows per plate appearance if all
the batters made contact against him.  It favors the pitcher who keeps the
result to home plate and off the bases (more walks and strikeouts, and less
balls in play).   Again, like with FF, for a visually appealing look, I
translate the PFF by multiplying it with a base number of 300. The lower the
PFF, the better. The higher the PFF, the more it expresses a weak pitcher
who is likely to keep the ball in play and be a big threat to surrender
extra base hits.

Here’s your top 5 in PFF, inclusive of relievers:
Aroldis Chapman, Randy Choate, Fernando Rodney, David Hernandez, Brandon League.
Top SPs in PFF include: Gio Gonzalez, David Price, Jason Hammel, Edinson Volque, Yu Darvish, Stephen Strasburg.

Here’s the full spreadsheet

I’ve included a list of PFF by teams. Not terribly surprising to see Tampa Ray Rays lead in this category.

From here, we’re getting mighty experimental. So far we’ve seen the strength of a batter and his ‘natural’ tendency to hit for power or average or some combo of this. Within this we’ve seen 6 examples depeinding on how the stats bear out.  What I’m doing with the pitcher side of things is to present a duplicate example; what the pitcher induces that makes him tough to make contact or hit for power against. That is, if that is possible! After all, the pitcher starts the action. We already know about gopher-ball and gopher-proof and groundball-flyball types. What more can we read into except to take away the possible benefics of the hitter?  At the same time I want to know if that pitcher is naturally attuned to giving up homers or singles or some Zen-like quality that has him rather balanced.    Recalling the Elias axiom again: similar styles between batter and pitcher always favor the pitcher. If the styles are rather different, the batter has the advantage. That chicken-and-egg thing, you know.   Further, recalling “The Tao Of Baseball”: “When confronted with Yang, employ Yin; When facing Yin, employ Yang” The book goes into describing how to defeat both types as well as defeated the same side of the coin.  I won’t plagarize so I encourage you to purchase the book through Fireside or your online dealer of choice.   Having studied the batter side much more than the pitcher, I can grok Bill James’ reasoning about use the offensive value %. Do pitchers reflect a shadow to that batter, taking away some of that value, neutralizing the effect? Or does he make it easier for the batter?  Admittedly I don’t have the answer.  I’ll roll with an extreme example.

Aroldis Chapman in the PFF ratings is #1 on my list. Looking at his batter/pitcher matchups from 2012, I wanted to find which batters gave him any real trouble.

Here’s what stood out: Jordan Pacheco was the only batter to get 2 hits off him! 2 of 3 in 3 PA.    As for Chapman’s easier pickin’s, Matt Downs went 0-for-6 and 5 Ks, Ryan Braun 0-for-4 and 3 Ks.   .
Recall that Chapman allows batters to make contact just 46% of the time, taking away all the BBs, Ks and HBPs. The 1.60 bases per hit is very close to the league average of 1.58..so he’ll allow his share of extra base hits, rather normal. But don’t count on them.   We’re talking about a thorougly Yin pitcher.  Now let’s see what kind of batters we’re talking about

Downs: contact: .738  bases per hit: 1.836. FF x 300: 406.408, ranking 24th in the league. % of Offensive Value is a sharp 52% also in the top 25 in that category. So this is a very balanced hitter and very dangerous. Somehow he meets his match when facing Chapman.  Strength vs strength. Tie in this case goes to the pitcher.
Braun: contact. 701, bases per hit: 1.865. Both numbers above league average. FF x 300 is 392.208, also above average. Not elite but definitely better than most. % of Offensive Value: .429. Big-time power hitter. He’ll beat you with the long ball given the chance.   But….not against the power pitching of Chapman,….and here again, the similar matchup favoring the pitcher.

Jordan Pacheco contact average: .829.  Bases per hit: 1.362. Mixed bag here. FF x300: 338.727. Right around the league average of 340. Not going to scare many pitchers but is not a pushover. So…power or average? Offensive Value for Pacheco is .689! Definitely a different type of hitter, solid singles type.  And this speaks to the similarity theory.

One more example, using another strong pitcher in Randy Choate. Opposing contact average: .636. Bases per hit: 1.27. If he was a hitter this would be a real weak fish. That’s how strong Randy is.

He faced his worst opposition in these batters:
Gregor Blanco (3-for-5, 3 doubles): contact: .653. BpH: 1.409. FFx300: 276. OV%: 45%. Very weak hitter, all numbers below league average, with a mild power focus.   How does this explain his success against Choate? Frankly it doesn’t as both sides are similar and should favor Choate in this case.  We’ll leave this one aside.
Michael Bourn: (3-for-3, 3 singles): contact: .675. BpH: 1.427 ,for an FFx300 of 288 and change. OV% is .498, one of the better balanced hitters in the game.  So…Blanco and Bourn are similar to themselves! And both find something in Choate’s pitching that looks real good.  Going with this angle, our next 2 batters should be something of the opposite. Let’s look:

Choate fared best against:
Jon Jay (0-for-6): contact  .760. BpH 1.31. FFx300: 298 .  Off Vl% is  .604.  Contact average is higher and a bit above the league average. FF number well below league average, so he’s not going to be much of a challenge to a pitcher. The Off Vl% is higher tho. He’s strongly geared to singles and sacrifices and walks, which is certainly different than Blanco and Bourn.
Brandon Crawford (0-for-3, 2 BBs)  contact: .724. BpH: 1.40.  FFx300: 305.  Off VL: .570.  Truly a singles hitter, also not a real pitcher’s threat.
So far with this example, this much holds water: Off VL per batter definitely points the way toward a batter’s success against a pitcher of a certain type.  Jay and Crawford are inclined to hit for average or play smaller ball. This approach does not work against Choate at all.  Bourn and Blanco are kryptonite tho for taking a centered/power approach.  Mind you: All 4 batters are relatively weak compared to other batters. It comes down to the Off VL % for certain to figure out Choate.

This may prove to be a good tool for changing a lineup, bringing in power hitters to replace the singles types.

Yes there are more columns in the PFF table. We’ve covered enough here. In a future post I’ll fill in those columns and attempt to make sense of it all.

KEYNOTE: Ranking the Fear Factor and Offensive Value stats

This post aims to sum up the prior post. So far I’ve revealed the Fear Factor (contact average and bases per hit), Percentage of Offensive Value (balancing point between power and average/sacrifice tendencies) and I’ve given you an historical and recent sample from each.

Putting it together we can construct something of a profile for each player and gather insight on what that batter’s tendencies appear to be. Further the batters can be grouped in the different parts of the matrix and we can also see who would best fit the prototype of an ideal batter. And who is that ideal, you ask? He should have:
High Fear Factor (top 10% of sample)
Central Offensive Value % (as close to 50% as possible)

What I did was rank each of the top 40 sample results from players in the 2012 sample, and the top 40 of the historical sample. From this the average rank in both samples will reveal which are the more complete hitters. 15 batters acheived top 40 status in both, and here’s how they stack up:
1: Chuck Klein
2: Vladimir Guerrero
3: Earl Averill
4: Ernie Banks
5: Moises Alou
6: Joe DiMaggio
7: Juan Gonzalez
8: Al Simmons
9: George Brett
10: Stan Musial
11: Cal Ripken
12: Billy Williams
13: Rogers Hornsby
14: Lefty O’Doul
15:  Ryan Braun
I assigned 1 point for the best %OV value, and 2 points for each ascending number away from the central figure. The values in OV I then doubled to get the rank, as I could go only 20 in either direction.  For the FF I simply charted the top 40. I combined the 2 point totals and ranked from lowest total to highest.  Everyone else listed with some portion of the top 40. I assigned a defacto score of 80 (midpoint between 41 and 121 for these players, added to the total they ranked in. 48 other players fell into this group. Here’s the top 10 best of the rest.
16 Tony Perez
17 Johnny Mize
18 tie Hank Aaron
18 tie Minnie Minoso
20 Albert Pujols
21 Jackie Robinson
22 Albert Belle
23 Lou Gehrig
24 Willie Mays
25 Hank Greenberg

Here’s the ranked historic spreadsheet attached to see the remainder of the results.

As times change for both active players and when I uncover more historical data, the historical sample will definitely grow in shape.  From time to time, maybe after I receive enough suggestions for players (please suggest, please?), I will revise this list and rerank. There are so many players that can be on this list. I just stuck with specific categories to begin with.
So in this unofficial example, Chuck Klein is the most complete ballplayer in history, and not far ahead of, of all players, Vlad The Impaler as his modern day equivalent..

Now for the 2012 rankings: For the sake of brevity I’m tracking those who are in the top 40 in either FF or %VO and applying the ranks in like manner as the historical.  As I crunched the numbers, just a few players matched the top 40 on either side. In order of #1 through #4, here they are:
Aaron Hill (437.748 FF, 3rd) .496%OV,
Adrian Beltre (425.045 FF, 7th) .510 %OV,
Troy Tulowitzki (407.334 FF, 23rd), .497 %OV
and Adam Jones (396.300 FF, 32nd), .503%OV.
These four are thus the most complete ballplayers from 2012, sporting a combination of making great contact, a threat to hit for power, and each showing no proclivity to rely on solely power or average to contribute to the team.

All the other players matched just one column. From here I’ll leave it to you to see which players match up with whom.  Just put the players who rank in 1 of the categories to be better than those who don’t rank at all.

Here’s the full spreadsheet of 2012 players with the ranks:

OK, OK, 2 examples, good examples at that. The top 2 players in FF did not rank in the %OV list:
Jose Bautista proved to be the most dangerous player by FF, with a 448.567 score. It’s paired with a walloping .301 %OV, absolutely dead-red sort of hitter.   Then there’s Yuniesky Betancourt, who’s 2nd in MLB for FF with 447.270.  And he’s on the opposite part of the %OV spectrum, proving to hit more for average than power, with his .587.  That score is closer to a perfect 50%, so Betancourt is more complete than Jose Bautista…
Remember that the ideal players should have an FF score as high as possible, with an %OV score as central to 50% as possible. Whether Bill James intended a central score or not, this is how I happen to see this formula.

Here’s hoping you’re inspired to compare your favorites for your fantasy league or otherwise start or end some good baseball discussions!

KEYNOTE: Reintroducing an old Bill James formula, applying historic/modern significance

I don’t recall the date I had begun walking to the library by myself but it probably once I turned 13. The walk was long 4 blocks which eventually would cover 20 minutes time, walking at my mom’s pace, as we would go up either the main street in our neighborhood in Southern Brooklyn, or we’d be curious and go up one of the nearby side streets to admire the housing and elements. I share in my mom’s everlasting sense of wonder and curiousness.  Enough so, that I did immediately graduate to one whole half of one floor of branch 44. Everywhere else in the library there were the adult books, the grown-up books, the mysterious microfilm, and the wooden racks that held the spines of the daily newspapers near the front.  We split time between that one and branch 56, near the subway, and a longer haul, always taking the scenic route, often combined with a shopping trip. Mom was always about ensuring there was enough food at home, even if there already was enough for a couple of weeks.
I seemed to recall gravititating to a few particular sections, with no real guidance as to which subject to lean to. The sports-related material were in the 740s of the Dewey Decimal System. Here I borrowed a good deal of books. One of these was an inspiring read about the Triple Crown horse racing winners. Another that comes to mind was a book on New Games, something of a hippie invention, but one that introduced such concepts as Earthball.

I had begun following baseball since about 1980, passively taking interest in the sport around 1976, hearing so much about the signing of one Reggie Jackson to the Yankees, and hearing next to nothing but about losing from the Mets side of things.  Dad seemed to be more of a Yankees fan, but I imagine he still cared about the Dodgers, as he was in his own teens attending games at Ebbetts Field. I never asked him about the 1950s rivalries between them, the Yankees and Giants.  I grew my interest in baseball through not only watching with my dad the Yankees AND Mets games, but the NBC Game Of The Week, a staple for decades at 1pm Saturday for pre-game, 115pm for first pitch. The 2 big names I had latched onto for heroes were Reggie, and Pete Rose. Maybe it was the constant exposure from TV (Reggievision, anyone?), or the way Dad pointed out the true hustle that was Charlie Hustle. While my folks did what they could to keep up with the neighbors (I don’t think we actually had Jones in the 6-story co-op we lived in), I was enrolled for one spring in the local Little League. I recall wearing #14 for Pete, and was something of a free swinger. I was OK, tho over the years I figured out how to swing late and develop more of a sense of patience. What did I know about tagging up, or coaching my teammates? I didn’t. But I did have fun, and I think the coaches understood.

So 1984 comes around, and I don’t stick around for Little League (the folks balked at having to pay for multiple seasons and probably wanted me on a short leash as it was).  As I check out other subjects like astronomy, astrology, some reference books, I find a new book among the sports. It’s The 1983 Bill James Baseball Abstract. Hello…..

This page from 2004 documents much of the innards of this publication.
Everything there is true. I learned about the wonders of Runs Created, The Favorite Toy, ballpark bias, The Pythagorean Method, a de facto Hall of Fame (and Museum) and thoroughly original analysis to explain what were on our baseball cards and TV sets. Sports radio was still some years away, as were the idea of pushbutton knowledge.  I was still yet to explore Elias’s masterwork publication in the annual Baseball Analyst.
I recall the library having to raise money to stay open more hours, and I was overjoyed to find that this book was for sale. And so I did purchase it..probably not more than one dollar.  My appreciation for the game grew and multiplied.

It’s in recalling this book that I introduce a formula that was introduced right alongside the Runs Created stat. I have not seen any further printing of the formula but as I use it, it’s become an absolute staple in my search for ideal players, whether to follow in fantasy, or replay league or otherwise for mindless historical comparatives. The latest such comparative was done in my previous post.

But now we turn to this formula: Percentage of Offensive Value.   James explains the stat to mean that a player uses more of his value when he bunts and sacrifices, and less so when swinging for the fences. From what I recall, he wrote nary a paragraph on the subject, with a small diagram explaining the difference. I believe he was properly giving more positive weight to the players who bothered to move the runners over in classic small-ball instead of playing solely for power or something derivative of Earl Weaver’s three-run-HR model (no SBs, no bunts, just get on base, and get them home)

The formula itself:
% = H(squared) /AB / RC
I don’t quite get why squaring a number should have the desired effect tho it might have something to do with the amount of standard deviations.
RC is Runs Created, calculated as RC = {(H+BB) x (TB)} / (AB + TB)} This is the original version, while its many spawns have included SB and sacrifice numbers.  The version I’m using is lifted from http://www.baseball-reference.com I’m pretty sure they stick to the old formula, but they don’t spell out exactly which version.

This formula, after dividing by the number of AB, spells out exactly how many runs the player would create if he had nothing but singles and BBs. When divided by the actual number of runs created, we’ll get a %, very typically between 25 and 75%, with numbers outside that range in extreme cases.  The higher the number, the greater amount of singles and walks and potential sacrifices were used in creating runs, while a lower amount suggested a power hitter is at work.

My own appropriation of %OV takes this standard and sets a balancing point of 50%. This mark introduces my concept of the Zen batter (a nod here to the Tao Of Baseball). Players within the 45-55% area are relatively balanced, being adept to hit for average and power alike, taking a few chances but also the ability to take one for the team, as it were.   Players below 40% and above 60% gathered real identities;  you can see the power and sacrifice qualities inherent in a ballplayer.   Ballparks I imagine will have some effects but these players generally don’t change stripes much at all.

Let’s go back to our 2 spreadsheets:
2012 MLB players:

Historical sample:

That last column with %OV is what we’re focusing on. Compare this percentage to the FF number and you’ll eventually uncover a different kind of matrix, one that represents more of the sense of a complete ballplayer.  Here are the 4 quadrants of the matrix:

Higher Fear Factor number (say, converted # in the 400s) with a lower %OV (below 45%)
In the historical sample, here are such players: Babe Ruth, Lou Gehrig, Aaron, Mays, Griffey, et al.

Higher Fear Factor number with a more balanced %OV: (IDEAL) Billy Williams, Moises Alou, Chuck Klein, Vladimir Guerrero, George Brett.

Higher Fear Factor number with a higher %OV (above 55%): (few examples, but 2012 players include Manny Machado, Brandon Phillips, Yunieksy Betancourt, Juan Rivera, Giancarlo Stanton)

Lower Fear Factor number with lower %OV (rare types: Rickey Henderson, Mike Cameron)

Lower Fear Factor number with a central %OV: Jose Guillen,

Lower Fear Factor number with higher %OV: Billy Hamilton, Hughie Jennings, Willie Keeler, David Eckstein, Lou Brock, Rod Carew.

Comparing Ruth to Gehrig again: Both have more of a true power number (surprise, but it’s Gehrig whose numbers are more centralized).

Willie, Mickey, Duke again:
Mays (418.772, .418)
Mantle (365.015, .353)
Snider (400.221, .423)
All three are strongly power oriented, Mantle more so than the others. At the same time, Mays is the most ‘dangerous’, with that FF number, while Mantle is more average at 365)

In the final post I’ll explore the concept of the ultimate batter based on the further comparison of these two stats.

KEYNOTE: Using a new baseball formula with a look to the historical(part 2 of 4)

What I want to convey with Fear Factor is the measure of how dangerous a player is at the plate, the threat of hitting a home run without missing so many pitches. The higher the number, the better. Anything above the converted number of 360 suggests a power hitter at work.
For a refresher course, read the prior post on how this is calculated

Naturally, after focusing on new formulas using players the 2012 season, I gravitated to building an historical sample using players of yesterday and yesteryear.  For the sake of this study, I took the top 25 players who led in these stats, career-wise: plate appearances, batting average, slugging average, strikeouts, walks, hit by pitch, hits and home runs.  I took out a handful of players whose stats were incomplete (all from the prior turn of the century), and added some players just for curiosity.

Here is the attached spreadsheet with this historical sample

Meanwhile: It’s hard to find a modern day hitter with a strong contact average. Best such examples outside of the top 12 include Brooks, Rose, Musial, Banks, Brett. Even Fernando Vina makes his presence felt here.

Joe Dimaggio impresses as having the best FF number of any player I’ve researched. An amazing combo of contact stats and bases per hit, second to none.  And if you recall the quick and dirty method, matching HRs to Ks, his stats are just about even. Find a player who compares that way.  I haven’t, yet.
Gehrig outpoints The Babe in this formula. Sure, the Babe had better firepower, but Lou’s higher contact average makes the better difference.  No real surprise that Hank and Mays are on the top 12 list but how about Ernie Banks? Another surprise among the often overlooked is Johnny Mize.

Again, the matrix is to be recalled:
Batters with high contact average, high bases per hit: Shoeless Joe and Lefty O’ are ideal here.
Batters with high contact average, low bases per hit: Willie Keeler the best example.l
Batters with low contact average, high bases per hit: Thome and Dunn are the ultimate free swingers.
Batters with low contact average, low bases per hit: Too many to mention.

Ted Williams might not be the greatest hitter ever, but he’s totally in the convo. His contact average is at the lower 30% mark, and his bases per hit are around the top 25% mark.

Pete Rose gets tepid marks but he was especially good at making contact, much more so than for any power he displayed.

Willie, Mickey, Duke? Measuring the FF, it’s Mays, by a fair amount, over The Duke.

I expect this to start conversations as well as end a few. I’ve highlighted the top 12 twelve in each category. The stat in the last column will be dealt with in the next post.