Construction of baseball data (bball in REBayes package) June 21, 2015

# Run the following R files in the order as listed.

>bball_webdata_collection_byYear.R
Collects data from ESPN website on players' 
    number of at bats (AB), 
    Hits (H), 
    walks (BB), 
    On base percentage (OBP) 
    indicator for pitchers by years for all NLB teams
This generates A_2002.Rda... A_2012.Rda files 

> webdata_byYear_to_panel_2002_2012.R
Reshape data: 
    1) aggregate into half seasons
    2) deals with name replication issues 
    3) Trimming criteria: more than 10 at bats in each half seasons
    4) since OBP is hard to aggregate, we keep only AB, H, BB, pitcher 
    5) further remove players with less than 3 records
Writes to: bball.Rda

> YOB_code.R
Add Year of Birth: this is collected using wget, 
downloads all the MLB player html pages, we then extract year of birth 
information from these html pages. 
The profile.txt contains YOB and player names
Writes into: bbally.Rda

> last_modification.R
Final fixes: 
    1) manual corrections for a few players
    2) correct pitcher indicator (some players are assigned to be pitcher 
	for just one or two games, so we don't regard them as pitchers. 
    3) Add in age and agesquare variable
Saves into bball.Rda

REBayes package  includes bball.rda, accessible from R with
require(REBayes); data(bball)
dimension is 11714 * 12, 1157 players, 188 pitchers
Documentation as usual with ?bball from R
