Sports Data Sources and Data Extraction - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Sports Data Sources and Data Extraction

Description:

ATL|Atlanta|Hawks|N. BA1|Baltimore|Bullets|N. BAL|Baltimore|Bullets|N. BOS|Boston|Celtics|N ... BUF|Buffalo|Braves|N. CAP|Capital|Bullets|N. CAR|Carolina ... – PowerPoint PPT presentation

Number of Views:316
Avg rating:3.0/5.0
Slides: 24
Provided by: aiBpaA
Category:

less

Transcript and Presenter's Notes

Title: Sports Data Sources and Data Extraction


1
Sports Data Sources and Data Extraction
  • Gavin Zhang
  • MIS580
  • University of Arizona
  • 02-06-2008

2
Outline
  • Sports Data Sources
  • Baseball
  • Basketball
  • Football
  • Olympics
  • Greyhound
  • Data Extraction
  • Case Study AZGreyhound System

3
Baseball Data Source
Download the database
  • http//www.baseball1.com/

4
Data Download
  • This database contains pitching, hitting, and
    fielding statistics for Major League Baseball
    from 1871 through 2007.
  • The data are provided in Microsoft Access, CVS
    and other formats.
  • The newest version is Version 5.5.
  • The database can be downloaded at
  • http//baseball1.com/content/view/57/82/

5
Database
AwardPlayers.csv
  • Detailed description of the database is available
    at
  • http//baseball1.com/content/view/57/82/
  • The database has 21 tables main tables include
  • MASTER Table- Player names, DOB, and biographical
    info
  • Batting Table- batting statistics
  • Pitching Table- pitching statistics
  • Fielding Table- fielding statistics.
  • Detailed description about each data field in
    each table is available.


6
Basketball Data Source
Download all of the player and team statistics
  • http//databaseBasketball.com/

7
Data Download
  • The website contains the NBA data from 1947 to
    2007 and ABA data from 1968 to 1976 on players,
    teams, leagues, all-star games, awards, and
    coaches.
  • Download at
  • http//databasebasketball.com/stats_download.htm

8
Database
Teams.txt
teamlocationnameleag ANAAnaheimAmigosA ANDA
ndersonDuffey PackersN ATLAtlantaHawksN BA1B
altimoreBulletsN BALBaltimoreBulletsN BOSBos
tonCelticsN BUFBuffaloBravesN CAPCapitalBul
letsN CARCarolinaCougarsA CH1ChicagoStagsN
CH2ChicagoZephyrsN CHACharlotteHornetsN CHI
ChicagoBullsN
  • This download contains nine column delimited
    files (.txt format), each of which represents a
    table in the database.
  • If you open the files up in excel, you may need
    to select Data - Text to Columns, then use the
    bar ("") character as the delimiter.

9
Football Data Source
  • http//www.pro-football-reference.com/

10
Data Download
  • A copy of data set (in CVS format) can be
    downloaded from http//ai.arizona.edu/hchen/chenc
    ourse/SportsData/Pro-football-refernce_CSV.zip
  • This version contains the game data from 1995 to
    2006. The dataset contains 64,327 players and the
    games they played in.
  • Tables include
  • Masterinformation about players
  • Seasonsthe statistics of the players records by
    season
  • Gamesthe statistics of the players records by
    game
  • Detailed description about each data field in
    each table is available.

11
Database
Master.csv

12
Some Other Football Data Sources
  • http//www.databasefootball.com/
  • The website contains the National Football League
    (NFL) data from 1922 to 2005 and Australian
    Football League (AFL) data from 1960 to 1969 on
    players, teams, leagues, awards, and coaches.
  • Data set can not be downloaded directly. The data
    need to be extracted from the HTML Web pages by
    using parsing programs.
  • http//www.jt-sw.com/football/
  • The website contains the player/coach statistics
    of NFL from 1920 to present and statistics of AFL
    from 1960 to 1969.
  • Data set can not be downloaded directly. The data
    need to be extracted from the HTML Web pages by
    using parsing programs.

13
Olympics Data Source
  • http//www.databaseolympics.com/

14
Data Format
  • DatabaseOlympics.com is your source for every
    Summer and Winter Olympics medal winner.
  • Summer Olympics from 1896-2004
  • Winter Olympics 1924 -2002
  • You'll find every medal winner for every country
    with easy links to each Olympics, sports, and
    athletes.

15
Data Format
16
Greyhound
  • http//66.236.122.2338080/tracklink/

17
Data Format
  • Data includes daily race programs (videos) and
    odds charts (.txt file format) for all US
    Greyhound tracks.
  • Some tracks had both Afternoon and Evening
    programs.

18
Data Format
Chart.txt
1st Grade B Distance 550 Condition Fast DOG
WT P O 1/8 Str Fin
Time Odds Comment PTL Jane
63.5 6 3 1 1 1 ns 32.00
11.60 Held At Wire Inside Silver Speck
68.5 1 1 2 2 2 ns 32.01
2.80 Cutff 1st, Stayd Cls Jain't It Doug
75 7 7 6 6 3 1.5 32.10
7.50 Closed For Show Outs Flyer Whitesocks
75.5 8 8 7 3 4 1.5 32.11
2.30 In The Hunt Flying Detroit
69 5 5 4 4 5 2 32.15
9.00 Not Far Behind Mdtrk VP Twix Twizala
59.5 3 4 3 5 6 4.5 32.31
4.20 Losing Position Ins Sergio
73 4 6 5 7 7 5
32.34 13.30 Blocked 1st Turn Heartattack
Jack 71.5 2 2 8 8 8 5.5
32.39 7.10 Bumped 1st Turn
19
Case Study AZGreyhound System
  • By Rob Schumaker

20
AZGreyhound System Design
Greyhound Data
AZGreyhound
DB
Odds Data
Model Building
Training / Testing
Race Data
Prediction
Betting Engine
Metrics
Traditional
Straight Bets
Box Bets
Win
Accuracy
Exacta
Quiniela
Payout
Place
Trifecta
Trifecta
Efficiency
Show
Superfecta
Superfecta
  • AZGreyhound System

21
Greyhound Data Extraction
  • Grayhound data was gathered from
    www.trackinfo.com. The Web site links to
  • GreyMatter http//66.236.122.2338080/tracklink/
  • TrackInfo http//www.trackinfo.com/index2.html
  • The race and odds data was parsed into a SQL
    Server database then the data was sent to the
    AZGreyhound system for prediction.

22
Example code
public void RacePrograms() throws Exception
... ... String URL1 "http//www.trackinfo
.com/trakdocs/hound/" String URL2
"/Rpages" ... ... OpenConnection2()
try ... ... TrackAbbrev
rSet.getString("TrackAbbrev") String URL
URL1 TrackAbbrev URL2 Feed
web.Scraper(URL, 1) ... ... NumItems
web.NumItems(Feed, "icons/html.gif")
for(int y 1 y Feed Feed.substring(Feed.indexOf("icons/html.gi
f")) FileName web.ExtractText(Feed,
"") Feed
Feed.substring(Feed.indexOf(" FileDate web.ExtractText(Feed, "NOWRAP",
"") FileContents
web.Scraper(URL "/" FileName, 1)
FileContents FileContents.replaceAll("'",
"-") db.Insert2DBProgram(FileName,
FileDate, FileContents)
CloseConnection2()
catch(SQLException e) System.out.println(e
)
This method picks up the overall race
information and puts it in the database
Data parsing URL
Parsing out each data field
Insert into DB
23
You can use the sports data sources introduced in
this set of slides for your data mining
project. You are strongly encouraged to identify
other interesting public sports data sets for
your project.
Thanks!?
Write a Comment
User Comments (0)
About PowerShow.com