Times caught stealing (CS) were not been recorded for the National League from 1876-1914, or from 1916-1919, and even more recently from 1926-1950. The American League followed this course from 1901-1913, and from 1916-1919. DMB simulates stolen bases from the run and jump ratings. The game does not key off the actual SB and CS stats; the event tables don’t include these as factors. Luke Kraemer, DMB tech support engineer said this about the matter on the DMB Forum, June 19, 2006: “SB and CS are mostly driven by the Jump and Steal ratings but the CM is aware of base stealers with extreme performances like Henderson, Coleman, Wills, etc. and makes some adjustments for these really aggressive runners.” Often, third-party pre-1927 database developers have determined their run and jump ratings from stolen base opportunities. So, it’s not necessary to have actual CS stats for DMB to replay a season; however, CS stats will tune-up the stolen base game more realistically than rating players by stolen base opportunities, so I created estimates for the Wiki Seasons and for second editions of 1894 and 1895. While rating players by stolen base opportunities works OK, my observations show it sets the steal ratings badly sometimes. Besides, I wanted a better idea of how the players performed and a good estimate of times caught stealing provides this, while a “0” in the CS column would reflects nothing. I didn’t like the CS estimates from the June 26, 2006 edition of the 1902 Wiki Season, so I modified it greatly. I worked on estimated times caught stealing from late December until late May, 2007. In the end, I created estimated CS stats for 11,410 player profiles for seasons ranging from 1894 to 1919, not counting composites displayed in the player register. I tried different methods and tested endless variations. In the long run, I applied a modification of Bill James’s speed score, SpS4, runs scored as a percentage of times on base: (R - HR)/(H + BB - HR). The usual procedure is to apply this to both league totals and individual players. I applied a different basis for the league total instead of simply applying the numbers from the singular season in question. To explain why, I need to touch on another modifier: the expected SB rate. A typical CS estimate formula looks like this: [[1/(( player runs per times on base / league runs per times on base) X expected SB rate)] - 1] X SB = CS Normally, the expected stolen base success rate applied is .55. This is averaged from known CS stats from neighboring seasons: 1914-1915 and 1920-1925. David Laurila of Baseball Prospectus asked baseball historian Dan Levitt, the co-author (with Mark Armour) of Paths to Glory, and a contributing author to SABR’S Deadball Stars of the American League and Deadball Stars of the National League the following question on April 29: What is known about the success rate of steal attempts in the [Deadball] era? Levitt: At a time when fewer home runs made scoring from first less likely, taking a greater risk to get into scoring position was acceptable. Although I can't offer a definitive answer on how much less the break-even percentage might have been during the deadball era, the actual success rate was roughly 55 percent. Managers were much more willing to sacrifice an out to get the runner to second. In creating the Wiki Season CS estimates for seasons spanning 1896 to 1916, I wasn’t satisfied by assuming the expected SB rate as being .55. It’s a fast way to an estimate, but it was part of the problem in creating inferior CS estimates for last year’s 1902 Wiki Season edition. As it turned out, I created not one but 23 different SB rates depending on player type and a player’s stolen base totals. Cumulatively, my CS estimates would reflect a league SB rate ultimately, but a league rate would not be part of the formula. (For players who stole no bases, but may have been caught stealing or being thrown out on a busted hit-and-run play, random numbers were generated for 4525 players. These had been sorted into 17 sets determined by NFP.) As I did with my new estimated strikeout figures for the Wiki Seasons, I sorted players by usage: regular, bench, and pitchers, an idea from Michael Schell’s Baseball's All-Time Best Sluggers. The Organizer note, Spawning Estimated Strikeouts for DMB, explains this idea and what NFPs are all about. I did apply SB rates from 1914-1915 and 1920-1925, but I used them differently. I found that the players from these eight years could be sub-grouped by stolen base totals. For example, regular players with more stolen bases tend to have more success than players with less. It’s not a stretch to imagine that a hundred players each with sixty-plus stolen bases will tend to have much higher SB rates than a hundred players with only six each. So I began grouping players into subsets of 30 or more samples, graduated generally by diminishing SB rates as the table shows: Regulars: SB RATE 40+ .716 31-40 .651 24-30 .615 20-23 .574 The four categories above include 1916 stats 17-19 .575 14-16 .542 11-13 .539 9-10 .521 8 .509 7 .489 5-6 .458 4 .409 3 .375 2 .326 1 .193 Bench: SB RATE 11-20 .647 6-10 .584 4-5 .563 3 .526 2 .495 1 .476 Pitchers: SB RATE 2-7 .707 1 .804 Since I averaged SB rates this way from eight seasons between 1914 and 1925, the caught-stealing formula’s league runs per times on base also reflects an average from all those years, and a not single season. In February, while I was prorating stolen bases for 1894 to 1897--modern stolen base records began being kept in 1898, I realized having 1914, 1915, and 1920 to 1925 as the basis for league runs per times on base would not work as well as using the actual seasons being evaluated. In the end, I rebuilt my estimated CS database with the league runs per times on base split into five player groups: 1894-1895 NL 1896-1897 NL 1898-1900 NL 1901-1919 NL 1901-1919 AL The nineteenth-century seasons had to be sorted into three smaller sets instead of one 1890s group because of the volatility of this period; however, delineating the 1901-1919 Deadball era into two super league sets created a firmer foundation with more samples to average than my original eight-season set ranging from 1914 to 1925. The resulting estimates produced less outliers and more consistent values. Since I had decided to make all the Deadball years, plus seven 1890 seasons, as the basis for league runs, it was easy to consider creating CS for all 11,410 players from these many years with missing CS numbers. I finished the estimates, and I compiled the league totals for the first time. I was surprised the SB rates were as high as they were considering the high CS totals I had seen for quite a few players during the many weeks I had doodled on this project. Next, I evaluated the estimates by sorting the records by player careers. Many results had been rounded down so that the estimate groups would more closely match the SB rate of the control groups. I had made an assumption about keeping the rates similar between the two groups, and now I was realized this was an unnecessary condition. Around this time, I discovered a few hot-shot base stealers like Collins, Cobb, and Carey had known CS stats for 1912 and 191; however, their success rates weren’t going to rewrite the record books. Here’s how the estimates looked after round one in mid May: Year Lg SB CS rate 1894 NL 2489 1601 0.609 1895 NL 2359 1579 0.599 1896 NL 2334 1463 0.615 1897 NL 2331 1527 0.604 1898 NL 2069 1479 0.583 1899 NL 2677 1634 0.621 1900 NL 1686 1103 0.605 1901 NL 1402 760 0.648 1902 NL 1362 905 0.601 1903 NL 1561 803 0.660 1904 NL 1573 963 0.620 1905 NL 1598 1024 0.609 1906 NL 1463 1028 0.587 1907 NL 1322 1012 0.566 1908 NL 1385 1100 0.557 1909 NL 1506 1136 0.570 1910 NL 1594 1114 0.589 1911 NL 1692 1040 0.619 1912 NL 1572 974 0.617 1913 NL 1578 1071 0.596 1914 NL 1440 1065 0.575 1915 NL 1191 995 0.545 1916 NL 1328 1094 0.548 1917 NL 1145 985 0.538 1918 NL 1029 788 0.566 1919 NL 1167 955 0.550 1901 AL 1451 603 0.706 1902 AL 1315 695 0.654 1903 AL 1154 707 0.620 1904 AL 1205 887 0.576 1905 AL 1328 908 0.594 1906 AL 1533 1088 0.585 1907 AL 1455 1091 0.571 1908 AL 1350 998 0.575 1909 AL 1545 1100 0.584 1910 AL 1670 1118 0.599 1911 AL 1722 997 0.633 1912 AL 1807 1084 0.625 1913 AL 1672 1126 0.598 1914 AL 1666 1375 0.548 1915 AL 1444 1040 0.581 1916 AL 1425 1172 0.549 1917 AL 1268 1048 0.547 1918 AL 960 843 0.532 1919 AL 912 771 0.542 The overall SB rate for all seasons was .593. For 1901 to 1919, the percentage was .589. So after all this, I decided to rework the estimates by rounding most of them straight up. Then, the career numbers were creamier and more consistent. If the CS estimates were to be any lower, I would have had to fudge the numbers to match public perception, but I was happy with the figures. Here are the league SB rates from the final round: Year Lg SB CS rate 1894 NL 2489 1643 0.602 1895 NL 2359 1644 0.589 1896 NL 2334 1516 0.606 1897 NL 2331 1609 0.592 1898 NL 2069 1522 0.576 1899 NL 2677 1690 0.613 1900 NL 1686 1139 0.597 1901 NL 1402 809 0.634 1902 NL 1362 975 0.583 1903 NL 1561 817 0.656 1904 NL 1573 1011 0.609 1905 NL 1598 1078 0.597 1906 NL 1463 1099 0.571 1907 NL 1322 1072 0.552 1908 NL 1385 1160 0.544 1909 NL 1506 1199 0.557 1910 NL 1594 1155 0.580 1911 NL 1692 1064 0.614 1912 NL 1572 987 0.614 1913 NL 1578 1109 0.587 1914 NL 1440 1124 0.562 1915 NL 1191 995 0.545 1916 NL 1328 1163 0.533 1917 NL 1145 1002 0.533 1918 NL 1029 837 0.551 1919 NL 1167 986 0.542 1901 AL 1451 621 0.700 1902 AL 1315 697 0.654 1903 AL 1154 734 0.611 1904 AL 1205 936 0.563 1905 AL 1328 948 0.583 1906 AL 1533 1131 0.575 1907 AL 1455 1136 0.562 1908 AL 1350 1053 0.562 1909 AL 1545 1174 0.568 1910 AL 1670 1178 0.586 1911 AL 1722 1019 0.628 1912 AL 1807 1114 0.619 1913 AL 1672 1201 0.582 1914 AL 1666 1375 0.548 1915 AL 1444 1040 0.581 1916 AL 1425 1153 0.553 1917 AL 1268 1097 0.536 1918 AL 960 911 0.513 1919 AL 912 796 0.534 1920 AL 752 707 0.515 1920 NL 969 862 0.529 1921 AL 685 545 0.557 1921 NL 803 771 0.510 1922 AL 681 515 0.569 1922 NL 755 634 0.544 1923 AL 741 606 0.550 1923 NL 824 657 0.556 1924 AL 749 581 0.563 1924 NL 754 659 0.534 1925 AL 711 582 0.550 1925 NL 672 517 0.565 1926 AL 664 509 0.566 TOT 69735 49719 0.584 The total above relates only to seasons from 1894 to 1919. The Deadball era estimated CS rate was .580. The NL was .578, and the AL rate was .582. Here are a few observations about the estimated league rates in light of those early baseball years. Improvement in catcher protection and defense don’t appear to be significantly improving after 1894. I expected the nineteenth-century SB rates to be a little higher. Perhaps with fewer samples for compiling league runs per times on base for 19th-century seasons, the estimates aren’t as strong. The estimates do indicate that the late 1890s was not an era for hog-wild base stealing. Two items stand out about base-stealing rates in the early 1900s: One, the foul strike rule gave players less opportunities to steal. Two, the disruption to National League rosters by the nascent American League made both circuits uneven in talent, and good base stealers took advantage of opposing weaker catchers. The new cork-center ball made batting averages jump in 1911 and 1912. Runs scored increased. My SB rates jump for both leagues and for both years, too. Whether the improvement in base stealing is an aberration of a formula that applies runs scored to CS estimates, or it is a natural consequence of higher on-base averages, has not been studied. One thing for sure, by 1913, pitchers had the edge again with the introduction of the emery ball. For the rest of the decade, Deadball resumed and stolen base rates fell. The late Deadball seasons with estimated CS values appear right in line with the known factors from 1914, 1915, and 1920-1925. See the organizer note, “About the Wiki Seasons Project,” to learn more about the wiki concept for DMB database development. Robert Bofors July 1, 2007