Title: MICCE
1MICCE
- Mutual Information Contextual Concept Extractor
Dr. Deborah Duong Virginia Tech Applied Research
Laboratory for National and Homeland
Security dduong_at_vt.edu
Mike Ross Science Applications International
Corp. Integrated Intelligence Solutions
Operation michael.s.ross_at_saic.com
2What is MICCE?
- Flexible architecture for discovering and
annotating concepts in unstructured data(based
on work of Lin, Pantel) in development at SAIC - Inspired by cognitive principles of perception
- Applicable to symbol grounding
3Main Idea
- Things which appear in similar contexts are
semantically similar (even if their surface
features differ)
4Current Results
- Applied to 100,000 Reuters News articles from
1996/1997 - upturn, decline, slowdown, improvement
- guilder, crown, penny, franc
- Bill Clinton, Oscar Luigi Scalfaro, Jacques
Chirac, Nelson Mandela, - Boris Yeltsin, Aleksander Kwasniewski, Hosni
Mubarak - learn, feel, believe, notice, felt, know,
guess, think - slide, dip, gain
- review, evaluate, study, examine
- sorry, comfortable, ashamed
- northeast, northwest, southwest, southeast
- contingent, pursuant, conditioned, subject,
conditional - certainly, obviously, simply, clearly
- recently, yesterday
5CBC Context Feature
- Text is ingested as sequences of word-stems
- Parsed by error-prone dependency parser
- Multiple context types (more can be added)
- Compute Mutual Information between every word and
every context. - The cook ate the salad with the onions.
- -- EAT (1.0)
- -- SALAD (1.0)
- -- ONION (1.0)
- --subj-- EAT (1.0)
- --subj-- EAT obj-- SALAD (1.0)
- --subj-- EAT with-- ONION (0.5)
- --subj-- EAT obj-- SALAD --with-- ONION (0.5)
COOK...
6CBC Similarity Distance
7CBC Similarity Distance
- Clusters are sets of words which appear in
similar contexts.
8CBC Decomposition Clusters act as basis vectors
warehouse, factory, facility
shrub, flower, bush
plant
9Feedback? (we hope). Mixing top-down and
bottom-up processing
- The cook ate the salad with the onions.
- -- EAT, CONSUME, DINE
- -- SALAD, SOUP, SANDWICH, LETTUCE
- -- ONION, PICKLE, TOMATO, GARLIC
-
COOK...
Underlying data may then be reinterpreted with(ea
t, onion) vs. with(salad, onion)
Concepts are formed from unstructured data
10Statistics vs. Cognitive/Neural Models of
Perception
- Similarity
- Mutual Information
- Vector decomposition
- Trimming vectors
- Neuron co-firing
- Novelty/Expectedness
- Hierarchical neural layers
- Forgetting
11Symbol Grounding
- In MICCE, concepts are essentially abstract
descriptions of relationships between
environmental symbols (percepts). - In an AI, these descriptions could be reinforced
by - additional perceptual data, or by homomorphism
between symbolic structure and conceptual
structure
Cook agent Eat-Event patient Salad
cook subj eat obj salad