Introduction to Lahman database

##  [1] "AllstarFull"         "Appearances"         "AwardsManagers"     
##  [4] "AwardsPlayers"       "AwardsShareManagers" "AwardsSharePlayers" 
##  [7] "Batting"             "BattingPost"         "CollegePlaying"     
## [10] "Fielding"            "FieldingOF"          "FieldingPost"       
## [13] "HallOfFame"          "Managers"            "ManagersHalf"       
## [16] "Master"              "Pitching"            "PitchingPost"       
## [19] "Salaries"            "Schools"             "SeriesPost"         
## [22] "Teams"               "TeamsFranchises"     "TeamsHalf"

Motivating questions

  1. Find all batters who have accumulated at least 3000 hits in their career.

  2. Find all batters who have accumulated at least 500 home runs in their career.

  3. Find all batters who have accumulated at least 300 home runs and at least 300 stolen bases in their career.

  4. Find all pitchers who have accumulated at least 300 wins and at least 3000 strikeouts in their career.

  5. Find all batters with at least 50 home runs in a single season.

  6. Find all batters who had at least 40 home runs and at least 40 stolen bases in a single season.

  7. Find the names of the top 10 batters who had the most hits during the decade of the 1980s.

  8. Find the names of all batters who have accumulated at least 3000 hits or at least 500 home runs but are not in the Hall of Fame.

  9. Make a plot showing the average strikeout rate in the AL and NL, going back to 1954.

  10. Consider Bill James’ famous “Pythagorean” model for expected winning percentage. \[ \hat{WPct} = \frac{1}{1 + (RA/RS)^k}, \] where \(WPct\) is team winning percentage and \(RS\) and \(RA\) are team runs scored and allowed, respectively. Find the optimal value of \(k\) using data since 1954.