Title: Tasks
1(No Transcript)
2 Blackboard Contains output of current run,
plus all past output Effectively contains ER
graph
Tasks
Apps
3Top Level src/ System scripts, such as
setup.pl, run.pl, etc. cfg/ System config
scripts, interpreted by src/run.pl log/ Global
log files repos/ Subversion repositories for
crawled data and past output data/ Global data,
such as crawled data, last runs output, DBLP
XML, safe data utils/ Global utilities, such
as Date, Log, CrawledDataAccess,
OutputAccess tasks/ Modules that perform
tasks, as defined earlier apps/ Modules that
perform apps, as defined earlier
Modules src/ Various implementations, i.e.,
different ways of performing the modules
function. Each is a Perl script called
by src/run.pl based on a config file. log/ The
local log for this module. output/ The XML
output of this module, accessible to all other
modules. These, plus the past output repository
repos/outputArchiveRepo (checked out
to data/outputArchive/) compose the
blackboard. / Some modules require other
directories. Though nothing enforces it,
other modules should not access them directly.
The module is responsible for maintaining these.
4- src/run.pl Workflow
- If no config files are given, parse schedule.xml
to find config files - Parse config files to determine which modules
will be executed - Clear output/ directories of modules that will be
executed - Execute modules in order as separate Perl
processes, passing each its own directory and any
arguments specified in the config file - Archive output/ directories of executed modules
in output repository