Title: acemindia
1Department ofComputer Science Engineering
Artificial Intelligence- An Introduction
2What is AI?
- Artificial Intelligence is composed of two
words Artificial and Intelligence, where
Artificial defines "man-made," and intelligence
defines "thinking power", hence AI means "a
man-made thinking power. - Artificial Intelligence exists when a machine can
have human based skills such as learning,
reasoning, and solving problems
3History of AI
4Does AI have applications?
- Autonomous planning and scheduling of tasks
aboard a spacecraft - Beating Gary Kasparov in a chess match
- Steering a driver-less car
- Understanding language
- Robotic assistants in surgery
- Monitoring trade in the stock market to see if
insider trading is going on
5Applications
6Goals of AI
- Problem solving
- Problem-solving agents
- In Artificial Intelligence, Search techniques are
universal problem-solving methods. Rational
agents or Problem-solving agents in AI mostly
used these search strategies or algorithms to
solve a specific problem and provide the best
result.
6/34
7An Agent
- Anything that can gather information about its
environment and take action based on that
information. -
8Components of a Basic Speech Recognition System
A speech capturing Device It consists of a
microphone, which converts the sound wave signals
to electrical signals and an Analog to Digital
Converter which samples and digitizes the analog
signals to obtain the discrete data that the
computer can understand. A Digital Signal Module
or a Processor It performs processing on the raw
speech signal like frequency domain conversion,
restoring only the required information
etc. Preprocessed signal storage The
preprocessed speech is stored in the memory to
carry out further task of speech
recognition. Reference Speech patterns The
computer or the system consists of predefined
speech patterns or templates already stored in
the memory, to be used as the reference for
matching. Pattern matching algorithm The unknown
speech signal is compared with the reference
speech pattern to determine the actual words or
the pattern of words.
9Working of the System
10Working of the System
A speech can be seen as an acoustic waveform,
i.e., signal carrying message information. This
acoustic waveform is converted to analog
electrical signals by the microphone. The Analog
to Digital converter converts this analog signal
to digital samples by taking precise measurements
of the wave at discrete intervals. The digitized
signal consists of a stream of periodic signals
sampled at 16000 times per second and is not
suitable to carry out actual speech recognition
process as the pattern cannot be easily located.
To extract the actual information, the signal in
time domain is converted to signal in frequency
domain. This is done by the Digital Signal
Processor using FFT technique. In the digital
signal, the component after every 1/100th of a
second is analyzed and the frequency spectrum for
each such component is computed. In other words,
the digitized signal is segmented into small
parts of frequency amplitudes. Each segment or
the frequency graph represents the different
sounds made by human beings. The computer
performs the matching of the unknown segments
with the stored phonetics of the particular
language.
11Factors on which Speech Recognition system
depends
- The speech recognition system depends on the
following factors - Isolated Words There needs to be a pause between
the consecutive words spoken because continuous
words can overlap making it difficult for the
system to understand when a word starts or ends.
Thus, there needs to be a silence between
consecutive words. - Single Speaker Many speakers trying to give
speech input at the same time can cause
overlapping of the signals and interruptions.
Most of the speech recognition systems used are
speaker dependent systems. - Vocabulary size Languages with large vocabulary
are difficult to be considered for pattern
matching than those with small vocabulary as
chances of having ambiguous words are lesser in
the latter.
12Components of ASR
LEXICON MODEL, ACOUSTIC MODEL, LANGUAGE MODEL
13Lexicon The lexicon is the primary step in
decoding speech. Creating a comprehensive lexical
design for an ASR system involves including the
fundamental elements of both spoken language (the
audio input the ASR system receives) and written
vocabulary (the text the system sends
out). Acoustic Model Acoustic modeling involves
separating an audio signal into small time
frames. Acoustic models analyze each frame and
provide the probability of using different
phonemes in that section of audio. Simply put,
acoustic models aim to predict which sound is
spoken in each frame. Language Model Todays
ASR systems employ natural language processing
(NLP) to help computers understand the context of
what a speaker says. Language models recognize
the intent of spoken phrases and use that
knowledge to compose word sequences. They operate
in a similar way to acoustic models by using deep
neural networks trained on text data to estimate
the probability of which word comes next in a
phrase. Together, the lexicon, acoustic model,
and language model enable ASR systems to make
close-to-accurate predictions about the words and
sentences in an audio input.
14How ASR Works?
- In the simplest terms, speech recognition occurs
when a computer receives audio input from a
person speaking, processes that input by breaking
down the various components of speech, and then
transcribes that speech to text. - Some ASR systems are speaker-dependent and must
be trained to recognize particular words and
speech patterns. These are essentially the
voice-recognition systems used in your smart
devices. You need to say specific words and
phrases into your phone before the ASR-powered
voice assistant starts working in order for it to
learn to identify your voice. - Other ASR systems are speaker-independent. These
systems do not require any training.
Speak-independent systems have the ability to
recognize spoken words regardless of the speaker.
Speaker-independent systems are practical
solutions for business applications like
interactive voice response (IVR).
15ASR Use Cases
- From speech recognitions mid-twentieth-century
origins to its multi-industry applications today,
the use cases for ASR technology are
far-reaching. ASR made it out of the computer
science laboratories and is now integrated into
our everyday lives. - Voice Assistants According to a 2020 survey
conducted by NPR and Edison Research, 63 of
respondents said they use a voice assistant. The
ability to use voice commands to help complete
tasks like opening mobile apps, sending a text
message, or searching the web affords users a
greater level of convenience. - Language learning For people engaged in
self-guided language study, apps using
speech-recognition tools put them a step closer
to having a comprehensive learning experience
during independent study. Apps like Busuu and
Babbel use ASR technology to help students
practice their pronunciation and accents in their
target languages. Using these apps, a student
speaks into their phone or computer in their
target language. The ASR software listens to that
voice input, analyzes it, and if it matches what
the system identifies as the correct
pronunciation, it informs the learner. If the
students voice input doesnt match what the ASR
knows to be correct, it will inform the student
of their missed pronunciation as well. - Transcription Services One of the first
widespread use cases of ASR was for the simple
transcription of speech. Speech-to-text services
offer a level of convenience in many contexts and
open the door to improved audio and video
accessibility. Health care practitioners use
dictation products like Dragon Naturally Speaking
to help them take hands-free notes while
attending to patients. ASR captioning also allows
for real-time transcription of live video, which
allows a broader audience to access the media. - Call Centers ASR is crucial for the automation
of processes for businesses with extensive
customer support demands. With an influx of
callers, companies need a way to efficiently
handle a vast amount of customer communication.
ASR technology is one of the main mechanisms
involved in smart IVR a system that automates
routine inbound communications as well as
large-scale outbound call campaigns.
16Challenges Issues in ASR
- Imprecision and false interpretations
- Time and lack of efficiency
- Accents and local differences
- Background noise and loud environments
- Privacy and data security
17Aravali College of Engineering And
Management Jasana, Tigoan Road, Neharpar,
Faridabad, Delhi NCR Toll Free Number 91-
8527538785 Website www.acem.edu.in