Title: Fca Lang
1Fca Lang
- A language for
- Formal Concept Analysis
2Note
- Le slide sono scritte in inglese
- Privilegia la semplicità
- Linguaggio tecnico affermato
- Amplia gli orizzonti di progetto
- Slides written in english
- Keep it simple
- Leading technical language
- Widenin project horizion.
3About Formal Concept Analysis
- FCA, acronym of Formal Concept Analysis is a way
to represent an manipulate knowledge. The
tecnique has solid math foundation.. ( p.o.set,
lattice, .. ) - Simply we represent a Formal Context as
- Context lt Extent, Intent, Relation gt
- Extent set of objects of the context
- Intent set of attributes in the context
- Relation between some objects and attributes
4FCA Example
- Context animals
- Objects LION,FINCH,EAGLE,HARE,OSTRICH
- Attributes preying,flying,bird,mammal
- Relation different representation ( tabular,
line diagram ) - Concepts
- Knowledge represented by connections ( like human
mind ) - subconcept-superconcept relation between flying
and bird
5Initial language spec.
- How represent and manipulate knowledge ?
- First seems important to represent the static
model of a - Context lt Extent, Intent, Relation gt
- In second Stage we will desire operating on the
model - Dynamic Operations ( later .. )
- Idea ( Static model ).
- context (
- extent ( ... ) , // Objects ( a list of )
- intent ( ) , // Attributes ( a list of )
- relate ( ... ) // Couples Obj-Atr ( list of
) - )
6Language Specification
- VT (, ), , , context, extent, intent, relate
- VN SCOPE, CONTEXT, EXTENT, INTENT, RELATE,
OBJLIST, ATRLIST, CNNLIST, OBJ, ATR, CNN, ID -
- PR
- SCOPE CONTEXT
- CONTEXT context ( EXTENT , INTENT ,
RELATE ) - EXTENT extent ( OBJLIST
- INTENT intent ( ATRLIST
- RELATE relate ( CNNLIST
- OBJLIST OBJLIST ,
- OBJ )
- ATRLIST ATRLIST ,
- ATR )
- CNNLIST CNNLIST ,
- CNN )
- OBJ ID
- ATR ID
- CNN ( OBJ , ATR )
7Language refination
- Note the list Production
- xxxLIST xxxLIST , xxx )
- Its LL(1) parser want to know the next token
to select the action to do. - xxxList and xxx are similar.
- It can be refined
- xxxLIST xxx ( , xxxLIST ) )
- Now its LL(0) the parser can choose with only
the current token. - There are three similar production Obj, Atr and
ID. Why ? - In future implementation we may change syntax (
and semantic ) of Obj and Atr. - For example we may associate a data structure to
the Obj, and a web reference to either Obj both
Atr.
8Prolog Parser
- context(extent(E),intent(I),relate(R)) -
- isObjList(E),
- isAtrList(I),
- isCnnList(R).
- isObjList().
- isObjList(HT)-
- isObj(H),
- isObjList(T).
- isAtrList().
- isAtrList(HT)-
- isAtr(H),
- isAtrList(T).
- couples ( Obj, Atr)
- isCnnList().
- isCnnList(X,YT)-
- isCnn(X,Y),
- SCOPE CONTEXT
- CONTEXT context ( EXTENT , INTENT , RELATE
) - EXTENT extent ( OBJLIST
- INTENT intent ( ATRLIST
- RELATE relate ( CNNLIST
- OBJLIST OBJLIST ,
- OBJ )
- ATRLIST ATRLIST ,
- ATR )
- CNNLIST CNNLIST ,
- CNN )
- OBJ ID
- ATR ID
- CNN ( OBJ , ATR )
- ID a-zA-Z ( a-zA-Z 0-9 )
9About Automatic Tools
- Javacc
- Simple to generate scanner parser
- Learning time 5 hrs
- Complex code, no structure, no apt
- Mantenaince time non predictable
- Javacc jjTree
- Simple to generate scanner, parser apt
- Learning time 10 hrs
- More complex code
- Manteinance time better, but not predictable
FcaParser.jj
FcaTreeParser.jjt
10Achitecture Strategy
Tokens
Characters
APT
scanner
parser
- Classical Architecture
- Robustness Diagram ( BCE )
- Boundary InputStream
- Controllers Scanner, Parser.
- Entities Characters, Tokens, Nodes.
- Implementation
- Handwritten Scanner Parser
- Scanner with Automa (DFA)
- Parser recursive-descent
- Tokens with type, text and position
Robustness Diagram
11Architecture Strategy (2)
- Tree Description
- Apt Nodes DefaulMutableTreeNode (swing)
- Easy to represent via a java built-in class
JTree - Visitor one for validation, others for modeling
data - Further different implementations
12Apt Example
- Once parsed a source we obtain an APT.
- A well-formed APT should looks like this.
- Taxonomy
- Context
- Extent
- Obj1, .., ObjN
- Intent
- Atr1, .., AtrN
- Relate
- Cnn1, .., CnnN
13Architectural Overview
14Scanner implementation
- TokenBuilder tBuild new Token Builder()
- Vector tokens new Vector()
- Automa automa new Automa()
- for ( int i0 iltsrc.length i)
- automa.input(srci)
- automa.changeState()
- switch (automa.MemoryOut())
- case ( Automa.START )
- tBuilder.initToken( srci )
- // noBreak
- case ( Automa.CONT )
- tBuilder.addChar( srci )
-
- switch (automa.TokenOut())
- case ( Automa.CHAR )
- tokens.addCharTok( tBuild.getToken( i ) )
- case ( Automa.WORD )
- tokens.add( tBuild.getToken( i ) )
- Scanner controlled by DFA
- one input the current char
- one memory output
- set in memory the token initial positon in the
source - continue to memorize char (token incrementally
build) - one token output to say
- If there is a token
- What token out.
15DFA details
- Has a finite number of Inputs
- private static final int CH_SKIP 0
- private static final int CH_TOK 1
- private static final int CH_LET 2
- private static final int CH_NUM 3
- private static final int CH_EOF 4
- private static final int CH_INV 5
- Has also a finite number of States
- private static final int S_INV -1
- private static final int S_INIT 0
- private static final int S_CHAR 1
- private static final int S_WORD 2
- private static final int S_END 3
- And a finite number of Outputs
- private static final int T_NONE 0
- private static final int T_CHAR 1
- private static final int T_WORD 2
16Parser Implementation
- scope()
- context()
-
- context()
- read(context)
- extent()
- read(,)
- intent()
- read(,)
- relate()
-
- extent()
- read(extent)
- read(()
- objList()
-
- objList()
- obj()
- Token t readNext()
SCOPE CONTEXT CONTEXT context ( EXTENT
, INTENT , RELATE ) EXTENT extent (
OBJLIST INTENT intent ( ATRLIST RELATE
relate ( CNNLIST OBJLIST OBJ ( , OBJLIST
) ) ATRLIST ATR ( , ATRLIST ) ) CNNLIST
CNN ( , CNNLIST ) ) OBJ ID ATR
ID CNN ( OBJ , ATR ) ID a-zA-Z
( a-zA-Z 0-9 )
- Recursive-Descent parser is easy to implement
- One method for every production types
- One function to read next Token
- read(String tokContent) Token
- read(int tokType) Token
- If needed one function to look ahead
- lookAhead(int howMany) Token
- NOTE
- read increment an internal parse index
- lookAhedd NO !
17Prototyping a first application
- The first application architecture is like this
- A frame for interacting with users
- A listener for the parsing process
- Simple parsing operation ( Lexical Syntax
analysis) - Parsing Validation/Creation (Semantic analysis
).
18Prototype 1
- Analyze the source
- Lexicon
- Syntax
- Then Visualize the APT,
- JTree with AptNode
context( extent( )
19Further Architecture Details
Classic Recognizer
20Application
- Features
- Lexical, Syntax Semantic Analysis
- Visitor for
- Validating
- Formal Context Building
- Inverse Source Building
context( extent( )
21Process Subsystems
- Process
- Iterative incremental (XP)
- UML modeling
- Documentation
- Coding
- Tools
- Developing
- Eclipse allaround tool
- ANT for build jar,javadocs ..
- UML
- JUDE free
22Future Developement
- Language Extension
- A context should contain more concepts
- Visitor Extension
- Shoul be a visitor that design a line diagram
- after lanfuage extension -gt nested line diagrams
- Knowledge base Change
- Why not use Prolog as a Knowledge base?
- Artificial Intelligence Extension
- Why not extend the standard AI model
23Applications
- E-learning
- Taxonomy Engeenering
- Knowledge representation
- Custom Trails of learning
- Data Base Applications
- Web Search
- Data-Mining
- Other ..