Title: Perl PLSQL Wrappers
1GLAST Data Handling Pipeline Daniel Flath,
Stanford Linear Accelerator Center For the GLAST
Collaboration
Stored Procedures Functions
- Did you know?
- Oracle has an internal language called PL/SQL
- PL/SQL is a strongly typed language, much like
pascal, that blends pure SQL with conditional and
program logic - PL/SQL supports both stored procedures that
perform an operation and stored functions that
return a scalar value or cursor containing table
rows - This powerful language allows SQL query results
to be manipulated before returning them, or
bookkeeping to be performed to log user access
history - The code is compiled and stored in the database
for rapid execution - The GLAST Pipeline includes over 130 stored
procedures and functions and over 2000 lines of
PL/SQL code - The pipeline does no direct manipulation of
database tables, all database access is
accomplished through the use of these native
subroutines - Stored PL/SQL can be invoked from perl, python,
java, even JSP via JDBC
PROCEDURE deleteTaskProcessByPK(TaskProcess_PK_in
IN int) IS RecordInfoList CURSORTYPE RI_PK_i
ter RecordInfo.RecordInfo_PKTYPE BEGIN --
Delete the RI_TaskProcess and RecordInfo
Rows OPEN RecordInfoList FOR select
RecordInfo_FK from RI_TaskProcess where
TaskProcess_FK TaskProcess_PK_in LOOP FET
CH RecordInfoList INTO RI_PK_iter EXIT WHEN
RecordInfoListNOTFOUND delete from
RI_TaskProcess where RecordInfo_FK
RI_PK_iter delete from RecordInfo where
RecordInfo_PK RI_PK_iter END LOOP CLOSE
RecordInfoList -- Delete TP_DS rows delete
from TP_DS where TaskProcess_FK
TaskProcess_PK_in -- Delete the
TaskProcess delete from TaskProcess where
TaskProcess_PK TaskProcess_PK_in END
Database
Perl PL/SQL Wrappers
- (See Stored Procedures Functions)
- We have developed a utility to generate perl
wrapper subroutines for every stored PL/SQL
routine used by the pipeline - These subroutines incorporate all of the tedious
PerlDBI calls to translate data types,
communicate with the database and handle
exceptions - The user only has to include the generated perl
module, and call the subroutines to gain all
functionality that the PL/SQL code provides
- Schema is normalized to provide efficient data
access for both - Management and administration of active
processing, and - Retrieval during data mining to support analysis
- Schema contains three distinct tables types
- Regular tables store the bolus of records
- Enumerated tables allow rows in regular tables
to specify one of several possible metadata
values without duplicating strings. This helps
with query speed and promotes consistency - Relational tables allow many-to-many record
relationships between two regular tables - Schema consists of two tiers
- Management tables for defining pipeline tasks
- Task table defines attributes of a task like
real-data-analysis or simulation - TaskProcess table defines a sequence of
processing steps for each task - Dataset table defines the data products of a
task, including the locations they should be
stored - TPI_DSI entries link TaskProcess and Dataset
records to define the read/write relationships
between processing and data for a task - Instance tables
- Each Run table record describes a specific
(named) instantiation of processing for some
task. There are many runs for every task. As an
example a monte carlo task may contain 1000 runs
each simulating and reconstructing 5000
particle/detector interaction events - For every Run, a TPInstance record is created for
each TaskProcess in the Runs parent task. A
TPInstance entry records statistics of the
compute-job that did the processing defined in
the corresponding TaskProcess. - Similarly for every Run, DSInstance records
(recording disk file location and statistics) are
created for each file corresponding to a Dataset
Record in the Task that owns the run - TPI_DSI records relate the TPInstance and
DSInstance processing/data history for a run
FUNCTION CREATETASK RETURNS NUMBER(22)
INPUT TASKTYPE_FK_IN
NUMBER(22) IN INPUT
TASKNAME_IN VARCHAR2
IN INPUT GLASTUSER_FK_IN
NUMBER(22) IN INPUT
NOTATION_IN VARCHAR2
IN INPUT BASEFILEPATH_IN
VARCHAR2 IN
INPUT RUNLOGFILEPATH_IN
VARCHAR2 IN sub CREATETASK
my selfshift input arguments as
hashref with keysparameters my argsshift
my dbhself-gtdbh function return
value my func_result undef PLSQL
function call my sth1dbh-gtprepare('
BEGIN func_result
DPF.CREATETASK(TASKTYPE_FK_IN, TASKNAME_IN,
GLASTUSER_FK_IN, NOTATION_IN, BASEFILEPATH_IN,
RUNLOGFILEPATH_IN) END ')
Bind Function Return Value sth1-gtbind_param_i
nout("func_result", \func_result, 0)
Binding parameters sth1-gtbind_param("TASKT
YPE_FK_IN", args-gtTASKTYPE_FK_IN)
sth1-gtbind_param("TASKNAME_IN",
args-gtTASKNAME_IN) sth1-gtbind_param("GLA
STUSER_FK_IN", args-gtGLASTUSER_FK_IN)
sth1-gtbind_param("NOTATION_IN",
args-gtNOTATION_IN) sth1-gtbind_param("BAS
EFILEPATH_IN", args-gtBASEFILEPATH_IN)
sth1-gtbind_param("RUNLOGFILEPATH_IN",
args-gtRUNLOGFILEPATH_IN) execute
sth1-gtexecute() die ! return stored
function result return func_result