NSF-Relevant Challenges in Computational Intelligence - PowerPoint PPT Presentation

About This Presentation
Title:

NSF-Relevant Challenges in Computational Intelligence

Description:

Extracting facts from text. Carnegie Mellon. School of Computer Science. 4. Influenza cultures ... Useless Region. Carnegie Mellon. School of Computer Science ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 23
Provided by: erichny
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: NSF-Relevant Challenges in Computational Intelligence


1
NSF-Relevant Challenges in Computational
Intelligence
  • Jaime Carbonell (jgc_at_cs.cmu.edu)
  • Tom Mitchell, Guy Bleloch, Randy Bryant, et al
  • School of Computer Science
  • Carnegie Mellon University
  • 26-April-2007

I) Major Computational Intelligence Research
Areas II) Next-Generation Infrastructure (DISC)
2
Computational Intelligence
  • Machine Learning
  • Inductive learning algorithms, active leraning
  • Data mining novel pattern detection
  • Language Technologies
  • Multilingual next-veneration search engines
  • Machine translation (e.g. Arabic ? English)
  • Perception
  • Computer vision, tactile sensing (e.g., in
    robotics)
  • Planning optimizing
  • Reasoning planning under uncertainty
  • Non-linear optimization (beyond O. R.)
    w/uncertainty
  • Key scientific applications
  • Proteomics, genomics, computational biology
  • Modeling human brain functions

3
Machine Learning
Speech Recognition
  • Reinforcement learning
  • Predictive modeling
  • Pattern discovery
  • Hidden Markov models
  • Convex optimization
  • Explanation-based learning
  • ....

Automated Control learning
Extracting facts from text
4
Leveraging Existing Data Collecting Systems
1999 Influenza outbreak
Influenza cultures
Sentinel physicians
WebMD queries about cough etc.
School absenteeism
Sales of cough and cold meds
Sales of cough syrup
ER respiratory complaints
ER viral complaints
Influenza-related deaths
Week (1999-2000))
Moore, 2002
5
Cluster Evolution and Density Change Detection
d2F(r(t))/dt2
6
Classifier Rocchio, Topic Civil War (R76 in
TREC10), Threshold MLR
7
Info-Age Bill of Rights
  • Get the right information
  • To the right people
  • At the right time
  • On the right medium
  • In the right language
  • With the right level of detail

Search Engines
Personalization
Anticipatory Analysis
Speech Recognition
Machine Translation
Summarization
8
MMR vs Current Search Engines
documents
query
MMR
IR
? controls spiral curl
9
Types of Machine Translation
  • Interlingua

Semantic Analysis
Sentence Planning
Syntactic Parsing
Text Generation
Transfer Rules
Source (Arabic)
Target (English)
Direct SMT, EBMT
Requires Massive Massive Data Resources
10
2005 NIST Arabic-English MT
  • Interlingual MT
  • Grammars, semantics
  • Best for focused domains
  • Corpus-Based MT
  • Pre-translated text (10-200M words)
  • Target language text (100M 1 Trillon words)
  • Best for general MT
  • Context-Based MT
  • Improved variant of corpus-based MT
  • Perfect client for DISC

Expert Human translator
BLEU Score
0.7
Usable translation
0.6
Human Edittable translation
Google
0.5
ISI
Topic Identification
IBM CMU
UMD
0.4
JHU-CU
Edinburgh
0.3
Useless Region
0.2
Systran
0.1
Mitre
FSC
0.0
11
Arabic Statistical-MT Output
???? 17 ????? / ?????? / ?? ??????? ?????? ????
???? ??????? ??????? ??? " ?????? ?????? ???????
??? ????? " ???? ?????? ??????? ?????? ????????
????? ??????????? ??????? . ??? ????? ???? ????
???????? ?????? ???? ?? ????? ????? ???? ????????
?????? ??????? ???????? ??? ????? ???? ??? ????
??????? ??????? ??? ?????? ????? ?? ??? ????
?????? ?? ???? ?????? ?? ?? ????? ?????? ?????? .
Beijing January 17 / Shinhua / the Chinese and
Russian officials urged all parties concerned to
" remain calm and exercise restraint " over the
nuclear issue of the Democratic People's Republic
of Korea. He met with vice Chinese foreign
minister Yang Chang won the deputy of the Russian
foreign minister Alexander Losyukov at a lunch
with invited interested parties to continue the
search for a peaceful solution through dialogue
under the current complicated situation.
BLEU .64
12
What About Minor Languages or Dialects without
Massive Data?
13
(Borrowed from Judith Klein-Seetharaman)
PROTEINS Sequence ? Structure ? Function
Primary Sequence
MNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML
AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA
VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG
GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT
WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN
ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES
ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG
SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT
LCCGKNPLGD DEASTTVSKT ETSQVAPA
Folding
3D Structure
Complex function within network of proteins
Normal
14
PROTEINS Sequence ? Structure ? Function
Primary Sequence
MNGTEGPNFY VPFSNKTGVV RSPFEAPQYY LAEPWQFSML
AAYMFLLIML GFPINFLTLY VTVQHKKLRT PLNYILLNLA
VADLFMVFGG FTTTLYTSLH GYFVFGPTGC NLEGFFATLG
GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT
WVMALACAAP PLVGWSRYIP EGMQCSCGID YYTPHEETNN
ESFVIYMFVV HFIIPLIVIF FCYGQLVFTV KEAAAQQQES
ATTQKAEKEV TRMVIIMVIA FLICWLPYAG VAFYIFTHQG
SDFGPIFMTI PAFFAKTSAV YNPVIYIMMN KQFRNCMVTT
LCCGKNPLGD DEASTTVSKT ETSQVAPA
Folding
3D Structure
Complex function within network of proteins
15
Predicting Protein Structures
  • Protein Structure is a key determinant of protein
    function
  • Crystalography to resolve protein structures
    experimentally in-vitro is very expensive, NMR
    can only resolve very-small proteins
  • The gap between the known protein sequences and
    structures
  • 3,023,461 sequences v.s. 36,247 resolved
    structures (1.2)
  • Therefore we need to predict structures in-silico

16
Linked Segmentation CRF
  • Node secondary structure elements and/or simple
    fold
  • Edges Local interactions and long-range
    inter-chain and intra-chain interactions
  • L-SCRF conditional probability of y given x is
    defined as

17
Fold Alignment Prediction ß-Helix
  • Predicted alignment for known ß -helices on
    cross-family validation

18
fMRI to observe human brain activity
Machine learning to discover patterns in complex
data
New discoveries about human brain function
Our algorithms have learned to distinguish
whether a human subject is reading a word e.g.
tools or buildings with 90 accuracy
19
Requisite Infrastructure
  • Data Intensive SuperComputing (DISC) for
    tera-scale and peta-scale data repositories
  • Advanced algorithms research
  • Massively-parallel decomposition
  • Scalability in analytics learning
  • Extracting compact models for run-time
  • Planning, reasoning, learning w/uncertainty)
  • Active Learning (maximally reducing uncertainty)
  • Domain expertise (e.g. proteomics, neural
    sciences, astronomy, network security, )

20
System Comparison Data
DISC
Conventional Supercomputers
System
System
  • System collects and maintains data
  • Shared, active data set
  • Computation colocated with storage
  • Faster access
  • Data stored in separate repository
  • No support for collection or management
  • Brought into system for computation
  • Time consuming
  • Limits interactivity

21
Program Model Comparison
DISC
Conventional Supercomputers
Application Programs
Application Programs
Machine-Independent Programming Model
Software Packages
Runtime System
Machine-Dependent Programming Model
Hardware
Hardware
  • Application programs written in terms of
    high-level operations on data
  • Runtime system controls scheduling, load
    balancing,
  • Programs described at very low level
  • Specify detailed control of processing
    communications
  • Rely on small of software packages
  • Written by specialists
  • Limits classes of problems solution methods

22
Final Thoughts
  • Opportunities in Computational Intelligence
  • Machine learning for tough problems relevant
    novelty detection, structural learning, active
    learning
  • Scientific applications Computational X
    (Xbiology, linguistics, astrophysics, chemistry,
    )
  • Next generation computational infrastructure
  • DISC principle (beyond HPC, beyond grid, )
  • Algorithmic fundamentals
  • International programs (on common problems)
Write a Comment
User Comments (0)
About PowerShow.com