Title: Jean-Charles LAMIREL, Jieh HSIANG
1Using a Background Neural Model in a Digital
Library
- Jean-Charles LAMIREL, Jieh HSIANG
- Liu WJ
LORIA, Nancy, France
2The CORTEX team
- Research areas Biological-like models for
intelligent information management - Applications
- Autonomous robotics and in-board intelligence
- Numerical classification (vs. symbolical)
- Information retrieval and discovery
3The CORTEX information retrieval and discovery
activity
- Main themes of research
- Interface for personalized access to information
- Intelligent multimedia data mining
- Web - Documentary database interaction
- Collaborations
- ORPAILLEUR INRIA team, INIST, LaVillette, NSC
Taiwan, industry... - European projects SCHOLNET, EISCTES
4Some examples of application
- Adaptive environment for assistance to
investigation on the Web - Multi-topographic navigation MultiSOM
- For multimedia data mining
- For data mining on full text (patents)
- Numerical-symbolic collaboration
5Presentation summary
- Introduction
- Basic set of functionalities for information
discovery - Limitations of the classical methods for
information discovery - The MultiSOM model Butterfly application
- Basic behaviour
- Extensions
- Management of textual information
lamirel_at_loria.fr
6Basic set of functionalities for information
discovery
- Synthetical view of the studied domain
- Distribution of the thematical indicators of the
domain - Highligting of regularities / weak signals
- Management of several type of synthesis
- Interactivity
- Dynamic data mixture / type of need
- Choice of meta-orientation of investigation
- Setting of the granularity level of the analysis
- Multimedia
7Managing different kinds of queries for discovery
- Exploratory (no goal) Which is the contents
of the database ? - Thematic (general orientation) Images of
space conquest - Connotative (hidden goal, indirect research)
Impressive images on human technology - Precise Images of Amstrong moonwalk, July 69
8Limitations of the classicalmethods for
information discovery
- Overall view of the studied domain
- Noise
- Complex interpretation (hidden information)
- Local views necessarily independant
- Weaks signal difficult to highlight
- No interactivity
- Passive classification
- Predefined ways to access to information
9Neural methods for information cartography
- Topographic learning (SOM)
- classification
- projection
- Multi-viewpoint modelization capabilties
(MultiSOM) - Intuitive auto-organization of information
- Active maps (IR Navigation)
- Low human intervention during construction
- Multimedia capabilities
10Butterfly museum application
- Different kinds of query
- Query by keywords
- Query by example
- Different kinds of criteria
- Colour (automatic)
- Shape (manual)
- Texture (manual)
- Problems
- Hand-made classifications
- Combination of results coming from different
criteria
Yellow very strong,Red not,Edge
strongSpot middle,
11Butterfly application automation
Global and/or cross viewpoints classifications
User interface
Combination of results
User interface
Validation of insertion or classification
recalculation
Butterfly application
Viewpoint classifications
12Basic topographic map building
- Data description
- Document (image) index vector eg vector of
characteristics - Weighting of the characteristics modalities (very
strong1, ) - Optionnal IDF weighting (weak signals detection)
13Basic topographic map building
- Map predefined parameters settings
- Number of neurons
- Structure eg 2D grid with square neighbourhood
- Competitive learning
14Selection of the winning neuron
Influence on the neigbourhood
Competitive learning
15Map labelization and zoning
- Map labelization
- Based on the best components of the profiles
- Class or member-oriented
- One single method is not sufficient
- gt Gives an overview of the detected themes
- Map zoning
- Based on the SOM topographic properties
- Based on the best components of the class
profiles - gt Gives an overview of the weights of the
themes
16(No Transcript)
17The MultiSOM model
18Map on-line generalization
- Goal
- Synthethize the map contents by decreasing the
number of neurons (classes) - Constraints
- Preserve the map topographic properties
- No classification re-computation
- Method
- Exploitation of the neighbourhood relations on
the map
19Map on-line generalization
20(No Transcript)
21Semantic viewpoints
- Subspace of the description space
- Can be a field, a subset of keywords, ...
- Possible overlapping sets
- Concurrent or complementary viewpoints
- gtExamples indexer keywords, title keywords,
authors, , visual characteristics, sounds - gtButterflies color, shape, texture,
22Inter-map communication
- Goal
- Cope with the limitations of a global map
- Allow communication between viewpoints
- Constraints
- Interpretable behaviour
- Method
- Re-projected data Transmitters neurons
- Two steps
- 1) Activation of a source map (directly or
through a query) - 2) Transmission to target maps
23Inter-map communication
24Inter-map communication
- A function
- Two modes
- Possibilistic (weak thematic relations over
viewpoints) - Probabilistic (mesure of the themes similarities)
gt g class belonging degree
25Activity coherency
STRONG FOCALIZATION
WEAK FOCALIZATION
26Inter-map communication
BUTTERLIES
27Compliance with IR operations
Response NO
Response YES
Question Are there butterflies with spots AND
veins ?
28Remaining problems (to be solved)
- Validation of the automatic classification
results by the experts - Testing of different results merging methods
- Test the use of prototype features in
classification - Realization of a Web interface for the maps
- Compare map build-in result combination mechanism
with external combination mechanism - Test map capabilities for the help in adding new
individuals - Introduce textual data and combine it with images
29(No Transcript)
30Experimentation on patents (texts)
- Goal Intelligent technological survey
- Full text analysis of the patents
- Domain of oil engineering
- Provide answers to questions like
- 1. Which are the relationships between patentees
?, - 2. On which specific technology does a patentee
work ? Which are the advantages of this specific
technology ? For which use ?,
31Basic experimental protocol
DILIBReformating
PatentsDatabase
Patents in XMLFormatStructured by Viewpoints
Nominal groupsExtraction
ValidatedMulti-indexes
Interactive maps for analysis
MicroNOMADMultiSOM
lamirel_at_loria.fr
32Nominal groups extraction
- 1) Lexicographic analysis (compound terms)
- 2) Normalization
- Ex oil fabrication and oil engineering
gt oil engineering -
- Results
33Patents reindexing
- Selected Viewpoints title, use, advantages and
patentees
34Example of dynamic analysis
DYNAMIC DEDUCTION Parentee TONEN CORP. is a
specialist of lubrification of the automatic
transmission . It products mainly oils based on
organo- molybdenum compound whic have the
specific property of having a friction
coefficient stable stable on a wide range of
temperature
35Classical methods (AK-means)
CLASSES MAP
36Conclusion
- Different viewpoints yield complementary results
- Ex Indexer keywords Closed themes, Title
keywords Open themes, ... - Detection of indexation inconsistencies
- Projection of thematic pertinence of a query
- Bilateral synergy images ltgt textual information
- Very rich and flexible inter-map communication
mechanism - Cross analysis between viewpoints, dynamics
- No limitation regarding viewpoints type and
number
37Perspectives
- Sophisticated 2D mapping, 3D mapping
- Pure image mosaic navigation
- Automatization of communication between
viewpoints - Interaction with Gallois lattice map zoning and
generalization, rule mapping, lattice entry
points selection - Applications
- La Vilette interactive browsing through museum
collection, setting up of exibitions - INIST Cartography of the Web (EISCTES EEC
Project)
383) Combining Symbolic and Numeric Techniques for
DL Contents Classification and Analysis
- Jean-Charles LAMIREL,
- Yannick TOUSSAINT (Orpailleur)
39Introduction
- Combining numerical and symbolic methods
- MicroNOMAD Self Organizing Maps (SOM)
- Basic SOM topographic properties
- MicroNOMAD multi-map communication process
- Lattice
- Formal properties and symbolic deduction
- Hierarchical structure and inheritance of
properties - Study of projection of SOM over lattice
- Making explicit formal properties on the map
- Map intelligent zoning and labelization
40Galois lattice
- Symbolic hierarchical method (i1, i2, p1, p2,
p3) - Partial order defined by the subsumption relation
over the set of formal concepts - (I1, P1) ? (I2, P2) ? I1 ? I2,
- (I1, P1) ? (I2, P2) ? P1 ? P2,
- ? I1, I2 there is a unique meet and join.
- Inheritance of properties
- Extraction of association rules
- Search Engine ? Web, IR
41I i1, i2, i3, i4, P AI, Robots, Search
Engine, Web, IR i1 Web, IR i2 Web, IR i3
Web, IR, Search Engine i4 AI, Robots
i1, i2, i3, i4 , ?
i1, i2, Web, IR
i4, AI, Robots
i1, i2, i3, Search Engine, Web, IR
?, IA, Robots, Search Engine, Web, IR
R1 Search Engine ? Web, IR
42Complementarity of approaches
- Kohonen SOM
- Complex weighting scheme
- Difficulty for precise interpretation
- Good illustrative power (topographic structure)
- Good synthesis capabilities
- Non linearity
- Lattice
- High number of classes
- Memory and time consuming
- Hierarchical structure
- Rule extraction
- Incrementality
433-steps methodology
Projection
Grouping
Agglomeration
44Conclusion
- Cosine method seems to be the best of the test
- Good accuracy
- Well-balanced agglomeration
- Agglomeration preserves closed areas on SOM
- Other projection and agglomeration methods have
to be tested - Preservation of partial order and inheritance
45Perspectives
- Evaluation on large corpus Expert
- Rule management
- class quality evaluation
- class labelisation
- Deduction validation on communicating maps
(lattice extensions) - Implementation of an operational prototype
46Other approaches
- Multi-classificator cooperation (PhD)
- SVM
- Stigmergy
- Genetic
- Neural maps
- On-line learning of user s behaviour,
intelligent relevance feedback
47Annexes
- Topographic inconsistencies
- Area computation
- Inter-map communication
- Activity coherency
48Topographic inconsistencies
NO INCONSISTENCIES
WEAK INCONSISTENCIES
STRONG INCONSISTENCIES
49Topographic inconsistencies
GLOBAL
STRONG
Neuron neighbourhood
50Area computation
WHILE
SO AS
IN
DO
END DO
51Inter-map communication
52Viewpoint oriented Patents Analysis
- Selected Viewpoints title, use, advantages and
patentees
lamirel_at_loria.fr
53Themes extending oil live and black sludge
control are strongly linked together because
they are neighbours on the map
black sludge apparition has a negative
incidence on the friction coefficient of oil
MAP OF VIEWPOINT ADVANTAGES