Application Areas - PowerPoint PPT Presentation

1 / 39
About This Presentation

Application Areas


mostly we'll look at NL understanding ... Next time, we will cover the following ... which object a particular surface belongs, dealing with optical illusions) ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 40
Provided by: NKU
Learn more at:


Transcript and Presenter's Notes

Title: Application Areas

Application Areas
  • For two lectures, we will examine a number of
    application areas of AI research
  • Visual image understanding
  • Speech recognition
  • Natural language processing
  • mostly well look at NL understanding
  • but we will briefly talk about NL generation and
    machine translation
  • Search engine technology
  • Next time, we will cover the following topics and
    possibly others (specific topics to be
  • Homeland security
  • Sensor interpretation
  • Robotic vehicles
  • AI in space

Perception Problems
  • Vision understanding, natural language processing
    and speech recognition all have several things in
  • Each problem is so complex that it must be solved
    through a series of mappings from subproblem to
  • Each problem requires a great deal of knowledge
    that is not necessarily available or
    well-understood such that successful applications
    often utilize non-knowledge-based mechanisms
  • Each problem contains some degree of uncertainty,
    often implemented using HMMs or neural networks
  • Early approaches in AI were symbolic and often
    suffered for several reasons
  • Poor run-time performance (because of the sheer
    amount of knowledge needed and the slow
    processors of the time)
  • Models that were based on our incomplete
    knowledge of human (or animal) vision, auditory
    abilities and language
  • Lack of learning so that knowledge acquisition
    was essential
  • Research into all three areas has progressed, but

Vision Understanding Mapping
  • The vision problem is shown to the left as a
    series of mappings
  • Low level processing and filtering to convert
    from analog to digital
  • Pixels ? edges/lines
  • Lines ? regions
  • Regions ? surfaces
  • Add texture, shading, contours
  • Surfaces ? objects
  • Classify objects
  • Analyze scene (if necessary)

A Few Details
  • Computer vision has been studied for decades
  • There is no single solution to the problem
  • Each of the mappings has its own solution
  • often mathematical, such as the edge detection
    and mapping from edges to regions
  • often applies constraint satisfaction algorithms
    to reduce the amount of search or computation
    required for low level processes
  • The intelligence part really comes in toward
    the end of the process
  • Object classification
  • Scene analysis
  • Surface and object disambiguation (determine
    which object a particular surface belongs,
    dealing with optical illusions)
  • Computer vision is practically an entire CS
    discipline and so is beyond the scope of what we
    can cover here (sadly)

Edge Detection
  • Waltz created an algorithm for edge detection by
  • finding junction points (intersections of lines)
  • determining the orientation of the lines into the
    junction points
  • applying constraint satisfaction to select which
    lines belong to which surfaces
  • Below, convex edges are denoted with and
    concave edges with -

Other approaches may be necessary for curve,
contour or blob detection and analysis Often,
these approaches use such mathematical models as
eigen-models (eigenform, eigenface), quadratics
or superquadratics, distance measures, closest
point computations, etc
For trihedral junction points (intersection of 3
lines), these are the 18 legal connections
Vision Sub-Applications
  • Machine produced character recognition
  • Solved satisfactorily through neural networks
  • Hand-written character recognition
  • Many solutions neural networks, genetic
    algorithms, HMMs, Bayesian probabilities, nearest
    neighbor matching approaches, symbolic approaches
  • printed character recognition highly accurate,
    cursive recognition greatly varies
  • Face recognition
  • Many solutions, often the solutions attempt to
    map the faces contours and texture into
    mathematical equations and Gaussian distributions
  • Image stabilization and image (object) tracking
  • Solutions include neural networks, fuzzy logic,
    best fit search
  • UAV input well discuss UAVs next time

Two Approaches to Handwritten Character
Neural Networks with voting Symbolic approach
using pattern matching
Speech Recognition
  • In spite of the fact that research began in
    earnest in 1970, speech recognition is a problem
    far from solved
  • Problems
  • Multispeaker people speak at different
  • Continuous speech the speech signal for a given
    sound is dependent on the preceding and
    succeeding sounds, and in continuous speech, it
    is extremely hard to find the beginning of a new
    word, making the search process even more
    computationally difficult
  • Large vocabularies not only does this
    complicate the search process because there might
    be many words that match, but it also brings in
  • SR attempts
  • Knowledge based approaches (particularly Hearsay)
  • Neural networks
  • Hidden Markov Models
  • Hybrid approaches

The Task Pictorially
The speech signal is segmented overlapping
segments are processed to create a small window
of speech Processing typically involves FFT and
Linear Predictive Coding (LPC) analysis This
provides a series of energy patterns at different
Phonetic Dependence
  • Below are two wave forms created by uttering the
    same vowel sound, ee as in three (on the left)
    and tea (on the right)
  • notice how dissimilar the ee portion is, in
    fact the one on the right is even longer
  • this problem is caused because of co-articulation
    one sound will directly impact the next sound

  • Hearsay-II attempted to solve the problem through
    symbolic problem solvers
  • Each called a knowledge group
  • They would communicate through a global mechanism
    called a blackboard
  • Each KG knew what part of the BB to read from and
    where to post partial conclusions to
  • A scheduler would use a complex algorithm to
    decide which KG should be invoked next based on
    priority of KGs and what knowledge was currently
    available on the BB

Hearsay could recognize 1000 words of continuous
speech and several speakers with a limited
syntax with an accuracy of around 90
Sample of Hearsay-IIs Blackboard
Sentence is Are any by Feigenbaum and Feldman?
One Neural Network Approach
  • One approach is to build a separate network for
    every word known in the system
  • For a small vocabulary isolated word system, this
    approach may work
  • no need to worry about finding the separation
    between words or the effect that a word ending
    might have on the next word beginning

Notice that NN have fixed sized inputs An
input here will be the processed speech signal
in the form of LPCs
Continuous Speech
  • The preceding solution does not work for
    continuous speech
  • Since there is no easy way to determine where one
    word ends and the next begins, we cannot just
    rely on word models
  • Instead, we need phonetic models
  • The problem here is that the sound of a phoneme
    is influenced by the preceding and succeeding
  • a neural network only learns a snapshot of data
    and what we need is context dependent
  • One solution is to use a recurrent NN, which
    remembers the output of the previous input to
    provide context or memory
  • Note that the RNN is much more difficult to
    train, but can solve the speech problem more
    effectively than the normal NN

Another Solution Multiple Nets
Here, there are multiple neural networks for the
various levels The segmentation module is
responsible for dividing the continuous
signal into segments The unit level Generates
phonetic units The word module combines possible
phonetic units to words
HMM Approach
  • The HMM approach views speech recognition as
    finding a path through a graph of connected
    phonetic and/or grammar models, each of which is
    an HMM
  • In this case, the speech signal is the
    observable, and the unit of speech uttered is the
    hidden state
  • Typically, several frames of the speech signal
    are mapped into a codebook (a fancy name for
    selecting one of a set of acoustic
  • Separate HMMs are developed for every phonetic
  • For instance, we might have a /d/ HMM and an /i/
    HMM, etc
  • There may be multiple paths through a single HMM
    to allow for differences in duration caused by
    co-articulation and other effects

Here, the speech problem is one of working
through several layers of HMMs, at the lowest
level are the phonetic HMMs, which combine to
make up word HMMs which combine to make up
grammar HMMs
Discrete Speech Using HMMs
  • Here, digits spoken one at a time are recognized
  • The HMM word model for a digit consists of 5
    states any of which can repeat
  • each model is trained before being used to adjust
    transition probabilities to the speaker
  • The process is simple, giving the LPC, work
    through each word model and find the one which
    yields the greatest probability using Viterbi

Continuous Speech with HMM
  • Many simplifications made for discrete speech do
    not work for continuous speech
  • HMMs will have to model smaller units, possibly
  • To reduce the search space, use a beam search
  • To ease the word-to-word transitions, use bigrams
    or unigrams

The process is similar to the previous slide
where all phoneme HMMs are searched using
Viterbi, but here, transition probabilities are
included along with more codebooks to handle the
phoneme-to-phoneme transitions and word-to-word
A successful path through the phoneme HMMs
to match the word seeks
A trained Phoneme HMM for /d/
HMM Codebook
  • HMMs need to have an observable state
  • For the speech problem, the observable is a
    speech signal this needs to be converted to a
  • The LPC coefficients are mapped to a codebook
    using a nearest-neighbor type of search
  • More codebooks mean larger HMMs (more observable
    states to model) but the more accurate the
    nearest-neighbor matching will be

HMM systems will commonly use at least 256
HMM/NN Hybrid
  • The strength of the NN is in its low level
    recognition ability
  • The strength of the HMM is in its matching
    ability of the LPC values to a codebook and
    selecting the right phoneme
  • Why not combine them?

Here, a neural network is trained and used to
determine the classification of the frames rather
than a matching codebook -- thus, the system can
learn to match better the acoustic information
to a phonetic classification Phonetic
classifications are gathered together into an
array and mapped to the HMMs
Outstanding Problems
  • Most current solutions are stochastic or neural
    network and therefore exclude potentially useful
    symbolic knowledge which might otherwise aid
    during the recognition problem (e.g., semantics,
    discourse, pragmatics)
  • Speech segmentation dividing the continuous
    speech into individual words
  • Selection of the proper phonetic units we have
    seen phonemes and words, but also common are
    diphones, demisyllables and other possibilities
    speech science still has not determined which
    type of unit is the proper type of unit to model
    for speech recognition
  • Handling intonation, stress, dialect, accent, etc
  • Dealing with very large vocabularies (currently,
    speech systems recognize no more than a few
    thousand words, not an entire language)
  • Accuracy is still too low for reliability (95-98
    is common)

  • We mainly look at NLU here
  • How can a machine understand natural language?
  • As we know, machines dont understand, so the
    goal is to transform a natural language input
    (speech or text) into an internal representation
  • the system may be one that takes that
    representation and selects an action, or merely
    stores it
  • for instance, if the NLU system is the front end
    of a database, then the goal is to form a DB
    query, submit it, and respond to the user with
    the DB results, or if it is the front end of an
    OS, the goal is to generate an OS command
  • Research began in the 1940s and virtually died in
    the early 1950s until progress was made in the
    field of linguistics in the 1960s
  • Since then, a number of approaches have been
  • Symbolic
  • Stochastic/probabilistic
  • Neural network

NLU on the left input text (or speech) and come
to a meaning by syntactic parsing followed
by semantic analysis NLG on the right a
planning problem how to create a sentence to
convey a given idea?
NLU Processes
  • Morphological analysis
  • Syntactic parsing
  • identifying grammatical categories
  • top down parser
  • bottom up parser
  • Semantic parsing
  • Identifying meaning
  • template matching
  • alternatives are semantic grammars and using
    other forms of representations
  • ambiguity handled by some form of word sense
  • Discourse analysis
  • Combining word meanings for a full sentence
  • handling references
  • applying world knowledge causality,
    expectations, inferences
  • Pragmatic analysis
  • Speech acts, cognitive states, and beliefs,
    illocutionary acts

Recursive Transition Network parser
Augmented Transition Network parser
An NLU Problem Ambiguity
  • At the grammatical level
  • words can take on multiple grammatical roles
  • Our company is training workings has 3
    syntactic parses
  • List the sales of the products produced in 1973
    with the products produced in 1972 has 455
  • At the semantic level
  • words can take on multiple meanings even within
    one grammatical category
  • consider the sentence I made her duck
  • At the pragmatic and discourse levels
  • what is the meaning of it sure is cold in here
    is this just a statement of discomfort, or a
    request to make it warmer?
  • identifying the relationship for references The
    chef made the boy some stew. He ate it and
    thanked him who do the he and him refer to?
    What does it refer to?
  • Some real US headlines with multiple

Fun Headlines
  • Hospitals are Sued by 7 Foot Doctors
  • Astronaut Takes Blame for Gas in Spacecraft
  • New Study of Obesity Looks for Larger Test Group
  • Chef Throws His Heart into Helping Feed Needy
  • Include your Children when Baking Cookies

Stochastic Solutions Offer Promise
  • HMMs bayesian probabilities are often used
  • in either case, build a corpora the probability
    that a given word or a given word transition to
    another word will have a specific meaning
  • this requires obtaining statistics on word
    frequencies, collocation frequencies (phrases),
    and interpretation frequencies
  • Problem Zipfs law
  • many words appear infrequently and therefore will
    have low probabilities in stochastic models
  • rather than using the word count to determine the
    probability of a given word, rank the words by
    frequency of occurrence and then use the position
    in the list to compute a probability
  • frequency ? 1 / rank
  • In addition, we need to filter collocations to
    remove common phrases like and so, one of
    the, etc, to obtain more reasonable frequency
  • otherwise, collocation frequencies will be
    misleading toward very common phrases

Other Solutions
  • Symbolic solutions include
  • Logic-based models for parsing and context-free
    grammars to generate parsers
  • finite state automota such as RTNs and ATNs
  • Parsing by dynamic programming
  • Knowledge representation approaches and case
    grammars (word models) for syntactic and semantic
  • Ad hoc knowledge-based approaches
  • Neural network solutions include
  • Parsing (combining recurrent networks and
    self-organizing maps)
  • Parsing relative clauses using recurrent networks
  • Case role assignment
  • Word sense disambiguation

NLG, Machine Translation
  • NLG given a concept to relate, translate it
    into a legal statement
  • Like NLU, a mapping process, but this time in
  • much more straight forward than NLU because
    ambiguity is not present
  • but there are many ways to say something, a good
    NLG will know its audience and select the proper
    words through register (audience context)
  • a sophisticated NLG will use reference and
    possibly even parts of speech
  • Machine Translation
  • This is perhaps the hardest problem in NLP
    becomes it must combine NLU and NLG
  • Simple word-to-word translation is insufficient
  • Meaning, references, idioms, etc must all be
    taken care of
  • Current MT systems are highly inaccurate

Application Areas
  • MS Word spell checker/corrector, grammar
    checker, thesaurus
  • WordNet
  • Search engines (more generically, information
    retrieval including library searches)
  • Database front ends
  • Question-answering systems within restricted
  • Automated documentation generation
  • News categorization/summation
  • Information extraction
  • Machine translation
  • for instance, web page translation
  • Language composition assistants help non-native
    speakers with the language
  • On-line dictionaries

Search Engine Technology
  • Search engines generally comprise three
  • Web crawler
  • simple, non-AI, traverser of web pages/sites
  • given web page, accumulate all URLs, add them to
    a queue or stack
  • retrieve next page given the URL from the queue
    (breadth-first) or stack (depth-first/recursive)
  • convert material to text format when possible
  • Summary extractor
  • take web page and summarize its content (possibly
    just create a bag of words, possibly attempt some
    form of classification) store summary,
    classification and URL in DB
  • note some engines only save summaries, others
    store summaries and the entire web page, but only
    use summaries during the search process
  • create index of terms to web pages (possibly a
    hash table)
  • Search engine portal
  • accept query
  • find all related items in the DB via hashing
  • sort using some form of rating scheme
  • display URLs, titles and possibly brief summaries

Page Categorization/Summaries
  • The tricky part of the search engine is to
    properly categorize or summarize a web page
  • Information retrieval techniques are common
  • Keywords from a bag of words
  • Statistical analysis to gauge similarities
    between pages
  • Link information such as page rank, hits, hubs,
  • Filtering
  • Many web pages (e.g., stores) try to take
    advantage of the syntactic nature of search
    engines and place meta tags in their pages that
    contain all English words
  • Filtering is useful in eliminating pages that
    attempt such tricks
  • Sorting
  • Once web pages have been found that match the
    given query, how are they sorted?
  • using word count, giving extra credit if any of
    the words are found in the pages title or the
    link text, examine font size and style for
    importance of the words in the document, etc

Page Ranking
  • Based on the idea of academic citation to
    determine somethings importance
  • PR(A) (1 d) d (PR(T1) / C(T1)
  • PR(A) page rank of page A
  • d a damping factor between 0 and 1 (usually
    set to .85)
  • C(A) number of links leaving page A
  • T1..Tn are the n pages that point at A
  • The page rank corresponds to the principle
    eigenvector of a normalized link matrix (that
    is, a matrix of pages and their links)
  • One way to view page rank is to think of an
    average web surfer who randomly is walking around
    pages by clicking on links (and never clicking
    the back button)
  • the page rank is in essence the probability that
    this page will be reached randomly
  • the damping factor is the likelihood that the
    surfer will get bored at this page and request
    another random page (rather than following a
    specific link of interest)

Googles Architecture
  • There are numerous distributed crawlers working
    all the time
  • Web pages are compressed before being stored
  • Each page has a unique document ID provided by
    the store server
  • The indexer uncompresses files and parses them
    into word occurrences including the position in
    the document of the given word
  • These word occurrences are stored in barrels to
    create an index of word-to-document mappings
    (using ISAM)
  • The Sorter resorts the barrel information by word
    to create a reverse index
  • The URL resolver converts relative URLs into
    absolute URLs

An Architecture for Improved Search
  • Search engines are limited in that they do not
    take advantage of personalized knowledge
  • This is of course understandable given that
    search engines have to support millions of users
  • Imagine that you had your own search engine DB,
    could you provide user-specific knowledge to
    improve search?

Three different techniques were tried using the
architecture to the left A search engine
provides a number (millions?) of URLs, which are
downloaded and kept locally User personalized
knowledge is applied to filter the items
found locally in order to provide only those
pages most likely to be of use
User Profiles
Partial user profile accumulated from text
files debugging 1100 dec 1785
degree 4138 def 4938 default 1349
define 2752 defined 1140
department 2587 dept 1328
description 2691 design 1780
deskset 6472 development 1403 different
2336 digital 1424 directory
2517 disk 1907 distributed 5127 Word
and frequency listed
  • The first approach merely created a bag of words
    annotated by their frequency
  • Both the words and the frequency were derived by
    accumulating text in user stored files
  • Word counts were summed based on which words
    appeared in the document this score was used to
    order the retrieved pages

User-Defined Knowledge Base
An alternative approach is to allow the user to
define his/her own knowledge-base of topics,
people, phrases and keywords that are of
importance Classification is done through a
user-defined hierarchy where each concept has its
own matching knowledge Two concepts are shown to
the left threshold means the number of matches
required for the category to establish
Learning-Driven Approach
  • Learning what bag of words best represents a
    category by having viewers vote on which web
    pages are relevant for a given query and which
    are not
  • Voting causes weights of the words in the bag to
    be altered
  • Implementations have included linear matrix
    forms, perceptrons and neural networks
  • this approach can also be used for recommendation
Write a Comment
User Comments (0)