Title: D2K Tutorial
1D2K Tutorial
2Outline
- Modules and D2K
- Command Line D2K and scripts
- Using Eclipse
- References
- http//alg.ncsa.uiuc.edu/do/tools/d2k/documentatio
n - http//alg.ncsa.uiuc.edu/do/tools/d2k/tutorials
- http//alg.ncsa.uiuc.edu/tools/docs/d2k/manual/ind
ex.html - http//alg.ncsa.uiuc.edu/tools/docs/d2k/principles
/index.html - http//alg.ncsa.uiuc.edu/tools/docs/d2k/faq/faq.ht
ml
3How Modules Work Review
How To Write a Module
- Module life cycle begins when it receives an
input - Each time the infrastructure delivers an input,
it will check to determine if the module is ready
to execute - When the module is ready, the infrastructure
queues the module for execution - System will then assign the module to a thread
when one becomes available - When the module executes, it will do its thing,
provide its outputs, then exit - If the module is still ready to execute, the
system will queue it again - Otherwise, it drops off the face of the earth
until it is enabled again
4How hard is it to write a module?
How To Write a Module
- We have an API to define what a given module is.
- Modules need the following methods implemented
- Module Info (getModuleInfo)
- Input and Output Info (getInputInfo and
getOutputInfo) - Input and Output Types (getInputTypes and
getOutputTypes) - Names (getModuleName, getInputName,
getOutputName) - Module execution (doit)
- Flexibility exists for other methods to be
overwritten to provide different functionality. - Optional methods exist for providing more
information about properties, etc.
5Objects Passed Between Modules
How To Write a Module
- Tables are used often by modules
- A Table has n rows and m columns
- Columns can be any data type
- Many different Table types exist
- Table
- ExampleTable
- PredictionTable
6Table API
How To Write a Module
- Many methods to access data
- public double getDouble()
- public void setDouble(double)
- public double getInteger()
- public void setInteger(Integer)
- etc
- Meta methods
- public int getNumRows()
- public int getNumColumns()
- public String getColumnLabel(int)
- public String getColumnType(int)
- public String getColumnComment(int)
- public void setColumnIsNominal(boolean value, int
position) - public void setColumnIsScalar(boolean value, int
position) - public boolean isColumnScalar(int position)
- public addColumn(col)
- etc..
- Table Factory
- public TableFactory getTableFactory() used to
create new columns in your table
7Adding a column to a new Table
How To Write a Module
- Table t (Table)pullInput(0)
- TableFactory factory t.getTableFactory()
- MutableTable newTbl (MutableTable)factory.create
Table() - Column col factory.createColumn
(ColumnTypes.DOUBLE) - col.addRows(10)
- newTbl.addColumn(col)
8Writing a Module
How To Write a Module
- Modules take some number of inputs
- Produce some number of outputs
- Inputs, outputs, and module function must all be
documented - Example Finding the mean of all entries in a
Table
9Step 1 imports
How To Write a Module
- Must import necessary packages
- Include the base implementation of a Module
- import ncsa.d2k.core.modules.
- Include the interface definitions of a Table
- import ncsa.d2k.modules.core.datatype.table.
10Step 2 Define the Class
How To Write a Module
- Must extend one of the basic module types
- public class TableMean extends ComputeModule
- Can also extend an existing module
- public class TableMedian extends TableMean
11Step 3 - Inputs
How To Write a Module
- Define the inputs for this module
- Inputs (and outputs) must be Objects they cannot
be primitive data types. Primitives can be boxed
in Integer, Double, etc. - Three methods need to be implemented
- public String getInputTypes()
- public String getInputInfo(int i)
- public String getInputName(int i)
12Step 3a getInputTypes()
How To Write a Module
- Defines the order and data types of the inputs to
the module - public String getInputTypes()
- String in ncsa.d2k.modules.core.datatype.ta
ble.Table - return in
-
- If multiple inputs then it looks like this
- public String getInputTypes()
- String in ncsa.d2k.modules.core.datatype.ta
ble.Table, - ncsa.d2k.modules.core.datatype.table.Table
- return in
13Step 3b getInputInfo(int i)
How To Write a Module
- Provide a detailed description of the input
- i is the index of the input
-
- public String getInputInfo(int i)
- if (i 0)
- return The table we want to analyze.
- else
- return null
14Step 3c getInputName(int I)
How To Write a Module
- Provide a name for each input
- i is the index of the input
- public String getInputName(int i)
- if (i 0)
- return Table
- else
- return null
15Step 4 - Outputs
How To Write a Module
- Define the outputs for this module
- Three methods need to be implemented
- public String getOutputTypes()
- public String getOutputInfo(int i)
- public String getOutputName(int i)
16Step 4a getOutputTypes()
How To Write a Module
- Defines the order and data types of the outputs
to the module - public String getOutputTypes()
- String in ncsa.d2k.modules.core.datatype.ta
ble.Table - return in
-
- If multiple outputs then it looks like this
- public String getOutputTypes()
- String in ncsa.d2k.modules.core.datatype.ta
ble.Table, java.lang.Double - return in
17Step 4b getOutputInfo(int i)
How To Write a Module
- Provide a detailed description of the output
- i is the index of the output
-
- public String getOutputInfo(int i)
- if (i 0)
- return The table we want to analyze.
- else if (i 1)
- return The mean value of all entries in the
table. - else
- return null
18Step 4c getOutputName(int I)
How To Write a Module
- Provide a name for each output
- i is the index of the output
- public String getOutputName(int i)
- if (i 0)
- return Table
- else if (i 1)
- return Mean Value
- else
- return null
19Step 5 Module Info
How To Write a Module
- Provide a description of the module function
-
- public String getModuleInfo()
- return This is a module that calculates the
mean of each - attribute in the table.
-
- Provide a name for the module
- public String getModuleName() return
TableMean -
20Properties
How To Write a Module
- Properties are parameters set at runtime
- Example A debug property
- private boolean debug false
- public boolean getDebug()
- return debug
-
- public void setDebug(boolean val)
- debug val
-
- public PropertyDescription getPropertyDescriptio
ns() PropertyDescription pds new
PropertyDescription1 - pds0 new PropertyDescription(debug,
Verbose debugging output, Print verbose
debugging output)
21doit()
How To Write a Module
- Perform the real work of the module
- Pull in the inputs
- public Object pullInput(int I)
- Push the outputs to the next module
- public void pushOutput(Object out, int I)
22doit() example
How To Write a Module
- public void doit() throws Exception
- Table table (Table)pullInput(0)
- double mean 0
- for(int i 0 i lt table.getNumRows () i)
- for(int j 0 j lt table.getNumColumns ()
j) - mean table.getDouble(i, j)
- mean / ij
- pushOutput(table, 0)
- pushOutput(new Double(mean), 1)
23Other methods
How To Write a Module
- beginExecution()
- endExecution()
- isReady()
24beginExecution()
How To Write a Module
- Method called when the itinerary begins execution
- Perform initializations here
-
- public void beginExecution()
- someStateVariable false
-
25endExecution()
How To Write a Module
- Called when the itinerary finishes execution
- Perform clean-up here
-
- public void endExecution() largeMemoryObject
null -
26isReady()
How To Write a Module
- D2K modules will become ready to execute whenever
their enabling criteria is met. - There are two types of enabling conditions
- receipt of data
- arrival of triggers
- Many modules will not need to make changes to
their default behavior. - By default, modules will enable whenever each of
its inputs contains data, and any triggers
attached have also arrived. - The getFlags method returns an array of integers.
This array contains a value for each input. For
example, the first integer value represents the
number of parcels available in the associated
input pipe. Essentially, we only care that there
is one, not how many. - For example, this isReady() method will return
true when it receives any one of two inputs - public boolean isReady()
- if(this.getFlags()0 gt 0
this.getFlags()1 gt 0 ) - return true
- else
- return false
-
27D2K Command Line (1)
How To Write a Module
- D2K provides a command line interface for
executing D2K itineraries - -nogui
- This argument disables the graphical user
interface. If no itinerary is loaded using the
-load argument, a script file should be included
to programmatically create an itinerary. - -load ltfile namegt
- This option specifies an itinerary to load. If a
full path name is not specified, D2K will look in
the itineraries directory to find the itinerary.
If the -nogui option is specified, the itinerary
will be loaded, any script specified will be
applied, and the itinerary will execute. If
-nogui is not specified, the itinerary will
simply be loaded into the D2K Toolkit Workspace. - -jini ltjiniurlgt
- Identifies the Jini URL to use when searching for
Jini enabled D2K services. This option overrides
the setting in the D2K properties file. - -script ltfilenamegt
- If this option is included, the script in the
given filename will be applied to the loaded
itinerary, or if no itinerary is loaded, the
script can be used to create an itinerary. If the
-nogui option is not specified, the script is
ignored. - -threads ltnumber threadsgt
- Use this option to specify the number of threads
D2K should create and employ for the execution of
the itinerary. This value is typically equal to
the number of the processors on the machine
running D2K. This option also overrides the
setting in the D2K properties file. - -vis ltvis file namegt
- This option is used to display a previously saved
visualization.
28D2K Command Line (2)
How To Write a Module
- java -server -Xmx256M -Xms256M -cp ltCLASSPATHgt
ncsa.d2k.D2K -nogui -load headless.itn gt
outputfile
29Scripting Itinerary Modifications
How To Write a Module
- D2K supports a number of scripting commands.
- These commands can be stored in a text file and
applied to an existing itinerary or they can
create a completely new itinerary. - add ltmodule namegt ltclass namegt
- Add an instance of a module of the given class
name to the itinerary, and name it "module name". - set ltmodule namegt ltproperty namegt ltvaluegt
- Set the property named "property name" of the
module named "module name" to the "value". The
property name here is the name as determined by
the name of the setter/getter methods. - remove ltmodule namegt
- Remove the module named "module name" from the
itinerary. - link ltparent module namegt ltoutput port indexgt
ltchild module namegt ltinput port indexgt - Connect the module named "parent module name" to
the module named "child module name". Parent's
output port is indicated by "output port index"
and the input port is indicated by "input port
index". - unlink ltparent module namegt ltoutput port indexgt
- Disconnect the port at "output port index" form
the module with the name "parent module name".
30Script Example
How To Write a Module
- The following script illustrates how the script
commands might be used - set "Apriori" minimumSupport 40.0
- set "Compute Confidence" confidence 90.0
- remove "Rule Visualization
- remove "RuleAssocReport
- remove "FanOut1
- add "Headless Rule Assoc Report"
"ncsa.d2k.modules.core.discovery.ruleassociation.
HeadlessRuleAssocReport - link "Compute Confidence" 0 "Headless Rule Assoc
Report" 0
31Using Eclipse (1)
How To Write a Module
- In Eclipse
- Select File -gt New -gt Project
- In New Project wizard
- Select Java Project
- Set project name to (for example) d2ktoolkit
- Select create separate source and output
folders under project layout - Click Next, not Finish
32Using Eclipse (2)
How To Write a Module
- Select Libraries tab and click Add External
JARs... - NOTE Or you can create a User Library in the
same manner, which can be used in multiple
Eclipse projects.
33Using Eclipse (3)
How To Write a Module
- Navigate to c\Program Files\D2KToolkit\lib
- Select all jar and zip files in this directory
(this is all files except the ext/ and plugins/
subdirectories) - Click Finish
34Using Eclipse (4) Exporting
How To Write a Module
- Modules that use D2K should now compile.
- You need to put these compiles classes into your
modules directory. - Use the File -gt Export utility.
- Select Jar File.
- Click Next.
35Using Eclipse (5) Exporting
How To Write a Module
- Expand your project and select the src directory.
- Click on the Export generated class files and
resources. - Choose a file name under Jar File
- Click on Compress the contents of the Jar file.
- Click Finish.
36Executing D2K from within Eclipse (1)
How To Write a Module
- In Eclipse, select Run -gt Run...
37Executing D2K from within Eclipse (2)
How To Write a Module
- Executing D2K from within Eclipse
- Optionally, enter a new name for the run
configuration (for example, toolkit) - Set main class ncsa.d2k.gui.ToolKit (capital T,
capital K)
38Executing D2K from within Eclipse (3)
How To Write a Module
- Increasing Memory Available
- Select Arguments tab
- Set VM Arguments to Xmx256M
- Set working directory
- Select Arguments tab
- Under working directory, uncheck check box and
set directory to c\Program Files\D2KToolkit - Select Apply and then, if desired, Run
39How To Write a Module
Homework
40Homework Problem
How To Write a Module
- Write a module to calculate the average for each
(numeric) column in a Table. - Output the means in a Table.
41The ALG Team
How To Write a Module
- Staff
- Bernie Acs
- Loretta Auvil
- David Clutter
- Vered Goren
- Eugene Grois
- Luigi Marini
- Robert McGrath
- Chris Navarro
- Greg Pape
- Barry Sanders
- Andrew Shirk
- David Tcheng
- Michael Welge
- Students
- Chen Chen
- Hong Cheng
- Yaniv Eytani
- Fang Guo
- Govind Kabra
- Chao Liu
- Haitao Mo
- Xuanhui Wang
- Qian Yang
- Feida Zhu
42References
How To Write a Module
- http//alg.ncsa.uiuc.edu/do/tools/d2k/documentatio
n - http//alg.ncsa.uiuc.edu/do/tools/d2k/tutorials
- http//alg.ncsa.uiuc.edu/tools/docs/d2k/manual/ind
ex.html - http//alg.ncsa.uiuc.edu/tools/docs/d2k/principles
/index.html - http//alg.ncsa.uiuc.edu/tools/docs/d2k/faq/faq.ht
ml
43Appendix Answer to the Homework Problem
How To Write a Module
- Write a module to calculate the average for each
(numeric) column in a Table. - Output the means in a Table.
44Details of Homework Problem
How To Write a Module
- Problem Find the mean for each column in a Table
- Input data A two dimensional table
- Output A table that has the average for each
column - Approach
- Create Module to read Table and output a Table
with the column averages. - Export to a jar file, install in D2K modules
- Create itinerary to read data from a file or
other source into a Table, then apply the average
module from Step 1, then output the results,
e.g., in a TableView - Create or use appropriate data set the input
module to use the data. - Run the itinerary.
45A. Creating a Module
How To Write a Module
- Imports
- Define the Class
- Define the Inputs
- Define the Outputs
- Module Information
- Properties
- Do the real work the doit()
46Step 1 imports
How To Write a Module
- Must import necessary packages
- Include the base implementation of a Module
- import ncsa.d2k.core.modules.
- Include the interface definitions of a Table
- import ncsa.d2k.modules.core.datatype.table.
47Step 2 Define the Class
How To Write a Module
- Must extend one of the basic module types
- public class TableMean extends ComputeModule
- Can also extend an existing module
- public class TableMedian extends TableMean
- Hint You may want to work from an example
module. For example, the Principles of Module
Development has several examples at the end.
You might cut and paste one of these, such as the
ModelPredictive.java - http//alg.ncsa.uiuc.edu/tools/docs/d2k/principles
/index.html
48Step 3 - Inputs
How To Write a Module
- Define the inputs for this module
- Inputs (and outputs) must be Objects they cannot
be primitive data types. Primitives can be boxed
in Integer, Double, etc. - Three methods need to be implemented
- public String getInputTypes()
- public String getInputInfo(int i)
- public String getInputName(int i)
49Step 3a getInputTypes()
How To Write a Module
- Defines the order and data types of the inputs to
the module - public String getInputTypes()
- String in ncsa.d2k.modules.core.datatype.ta
ble.Table - return in
-
- If multiple inputs then it looks like this
- public String getInputTypes()
- String in ncsa.d2k.modules.core.datatype.ta
ble.Table, - ncsa.d2k.modules.core.datatype.table.Table
- return in
50Step 3b getInputInfo(int i)
How To Write a Module
- Provide a detailed description of the input
- i is the index of the input
-
- public String getInputInfo(int i)
- if (i 0)
- return The table we want to analyze.
- else
- return null
51Step 3c getInputName(int I)
How To Write a Module
- Provide a name for each input
- i is the index of the input
- public String getInputName(int i)
- if (i 0)
- return Table
- else
- return null
52Step 4 - Outputs
How To Write a Module
- Define the outputs for this module
- Three methods need to be implemented
- public String getOutputTypes()
- public String getOutputInfo(int i)
- public String getOutputName(int i)
53Step 4a getOutputTypes()
How To Write a Module
- Defines the order and data types of the outputs
to the module - public String getOutputTypes()
- String in ncsa.d2k.modules.core.datatype.ta
ble.Table - return in
54Step 4b getOutputInfo(int i)
How To Write a Module
- Provide a detailed description of the output
- i is the index of the output
-
- public String getOutputInfo(int i)
- if (i 0)
- return The table we want to analyze.
- else if (i 1)
- return The mean value of all entries in the
table. - else
- return null
55Step 4c getOutputName(int I)
How To Write a Module
- Provide a name for each output
- i is the index of the output
- public String getOutputName(int i)
- if (i 0)
- return Table
- else if (i 1)
- return Mean Value
- else
- return null
56Step 5 Module Info
How To Write a Module
- Provide a description of the module function
-
- public String getModuleInfo()
- return This is a module that calculates the
mean of each - attribute in the table.
-
- Provide a name for the module
- public String getModuleName() return
TableMean -
57Step 6 Set up Properties
How To Write a Module
- Properties are parameters set at runtime
- Example A debug property
- private boolean debug false
- public boolean getDebug()
- return debug
-
- public void setDebug(boolean val)
- debug val
-
- public PropertyDescription getPropertyDescriptio
ns() PropertyDescription pds new
PropertyDescription1 - pds0 new PropertyDescription(debug,
Verbose debugging output, Print verbose
debugging output)
58Step 7 doit()
How To Write a Module
- Perform the real work of the module
- Pull in the inputs
- public Object pullInput(int I)
- Push the outputs to the next module
- public void pushOutput(Object out, int I)
59doit() example (part 1)
How To Write a Module
- public void doit() throws Exception
- / read the input (from input 0) into a Table /
- Table t (Table)pullInput(0)
- / Use a TableFactory to create table fo
rthe output values / - TableFactory factory t.getTableFactory()
- MutableTable newTbl (MutableTable)factory.crea
teTable()
60doit() example (2 of 3)
How To Write a Module
-
- / For each column, compute the mean across
all the rows. / - for(int i 0 i lt t.getNumColumns () i)
- Column col factory.createColumn(ColumnTypes.DO
UBLE) - col.addRows(1)
- newTbl.addColumn(col) newTbl.setColumnLabel(t
.getColumnLabel(i)"Mean",i) - if (t.isColumnNumeric(i))
- double mean 0
- for(int j 0 j lt t.getNumRows () j)
- mean t.getDouble(j, i)
- mean / t.getNumRows()
- newTbl.setDouble(mean,0,i)
- else
- / if the column is non-numeric, set mean to
-1 / - newTbl.setDouble(-1,0,i)
-
-
-
61doit() example (3 of 3)
How To Write a Module
-
- / put the means out to port 0 /
- pushOutput(newTbl, 0)
62Source code
How To Write a Module
- An example solution at
- http//algdocs.ncsa.uiuc.edu/TableMean.java.
txt
63B. Compile, create jar, install in D2K/modules
How To Write a Module
- Compile the module
- Create a Jar file. (E.g., export from eclipse)
- Install Jar in D2K/modules
64C. Create an Itinerary
How To Write a Module
- The itinerary must read the data into a Table,
then run the TableMean, and then output - Hint You might find an example itinerary that
is similar. For example, in the D2K toolkit
there is an example in - itineraries/DataLoading/FileSupport/Delimi
tedFileToTable - You could copy this itinerary, and then add the
new module
65Step 1 Copy Itinerary
How To Write a Module
- Open the Itineraries tab
- Select the itinerary, e.g., DataLoadinggtFileSuppor
tgtDelimitedFileToTable - Load the Itinerary
- Select FilegtSave itinerary As, save a copy of the
Itinerary with a new name.
66Step 2 Add the new Module to the itinerary
How To Write a Module
- Open the Modules tab
- Find the new module
- Drag onto work area
- Create connection from the output of the
ParseTable to the input of the new module
67Step 3 Add visualization
How To Write a Module
- Find the ncsa/d2k/modules/core/vis /TableViewer
module - Drag onto work area
- Create a connection from the output of the new
module to the input of the TableViewer
68The New Itinerary
How To Write a Module
69C. Select a Dataset
How To Write a Module
- Determine what data to use
- This example needs a table with one or more
columns of numbers - Example in the D2K directory, data/UCI/iris.csv
- Set the input to the file
- Click on the input module
- Set the File Name to the dataset
70D. Run the itinerary
How To Write a Module
- Click Run
- The result will pop up in a TableViewer window.
71Example Result
How To Write a Module
72Done
How To Write a Module
- and its just that simple!