SLASHPack: Collector Performance Improvement and Evaluation - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

SLASHPack: Collector Performance Improvement and Evaluation

Description:

Number of Views:47

Avg rating:3.0/5.0

Slides: 30

Provided by: csUs7

Learn more at: https://www.cs.usfca.edu

Category:

more less

Transcript and Presenter's Notes

Title: SLASHPack: Collector Performance Improvement and Evaluation

1
SLASHPack CollectorPerformance Improvement
andEvaluation

2
Outline

3
Outline

4
Introduction

SLASHPack Toolkit
(Semi-LArge Scale Hypertext Package)
Sponsored by Prof. Chris Brooks, engineered for
initial clients Nancy Montanez and Ryan King.
Collector component
Framework for collecting documents.
Evaluate and improve performance.

5
Contact and Information Sources

Contact Information
Rudd Stevens rstevens (at) cs.usfca.edu
Project Website
http//www.cs.usfca.edu/rstevens/slashpack/collec
tor/
Project Sponsor
Professor Christopher Brooks Department of
Computer Science University of San Francisco
cbrooks (at) cs.usfca.edu

6
Stages

7
Implementation

8
High level design

SLASHPack designed as a framework.
Modular components, that contain sub-modules.
Collector pluggable for protocol modules,
parsers, filters, output writers, etc.

9
High level design (cont.)
10
Outline

11
Performance Testing

12
Collector Runtime Statistics

13
Collector Runtime Statistics

14
Challenges

15
Weblog raw data

ltpostgt
ltweblog_urlgt http//www.livejournal.com/users/chu
ckdarwin lt/weblog_urlgt ltweblog_titlegt
""Evolve!" lt/weblog_titlegt ltpermalinkgthttp//www
.livejournal.com/users/chuckdarwin/1001264.html
lt/permalinkgt
ltpost_titlegt Flickr lt/post_titlegt
ltauthor_namegt Darwin (chuckdarwin)
lt/author_namegt
ltdate_postedgt 2005-07-09 lt/date_postedgt
lttime_postedgt 000000 lt/time_postedgt
ltcontentgt lthtmlgtltheadgtltmeta
content"text/html charsetUTF-8"
http-equiv"Content-Type"/gtlttitlegt""Evolv
e!""lt/titlegtlt/headgtltbodygt
ltdiv style"text-align center"gtltfont
size"1"gtlta href"http//www.nytimes.com/20
05/07/09/arts/09boxe.html?ei5088ampampen61cfc
d5835008b1aampampex1278561600ampamppartner
rssnytampampemcrssampamppagewantedprint"g
t7/7 and 9/11?lt/agtlt/fontgtlt/divgt
lt/bodygtlt/htmlgt
lt/contentgt
ltoutlinksgt
ltoutlinkgt
lturlgt http//www.nytimes.com/2005/07/09/arts
/09boxe.html lt/urlgt
ltsitegt http//www.nytimes.com lt/sitegt
lttypegt Press lt/typegt
lt/outlinkgt
lt/outlinksgt
lt/postgt

16
Weblog processed data

ltspdatagt
lturlgthttp//www.livejournal.com/users/chuckdarwin
lt/urlgtltdategt20060212lt/dategt
ltcrawlnamegtWeblogPosts20050709lt/crawlnamegt
ltwebloggt
ltweblog_titlegt"Evolve!""lt/weblog_titlegt
ltpermalinkgthttp//www.livejournal.com/us
ers/chuckdarwin/1001264.htmllt/permalinkgt
ltpost_titlegtFlickrlt/post_titlegt
ltauthor_namegtDarwin (chuckdarwin)lt/autho
r_namegt
ltdate_postedgt2005-07-09lt/date_postedgt
lttime_postedgt000000lt/time_postedgt
ltoutlinksgt
ltoutlinkgt
lttypegtPresslt/typegt
lturlgthttp//www.nytimes.com/2005/07/09/arts/09b
oxe.htmllt/urlgt
ltsitegthttp//www.nytimes.comlt/sitegt
lt/outlinkgt
lt/outlinksgt
lt/webloggt
lttagsgtlt/tagsgt