acemindia - PowerPoint PPT Presentation

About This Presentation
Title:

acemindia

Description:

Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial defines "man-made," and intelligence defines "thinking power", hence AI means "a man-made thinking power.“ Artificial Intelligence exists when a machine can have human based skills such as learning, reasoning, and solving problems – PowerPoint PPT presentation

Number of Views:0
Learn more at: http://www.acem.edu.in

less

Transcript and Presenter's Notes

Title: acemindia


1
Department ofComputer Science Engineering
Artificial Intelligence- An Introduction
2
What is AI?
  • Artificial Intelligence is composed of two
    words Artificial and Intelligence, where
    Artificial defines "man-made," and intelligence
    defines "thinking power", hence AI means "a
    man-made thinking power.
  • Artificial Intelligence exists when a machine can
    have human based skills such as learning,
    reasoning, and solving problems

3
History of AI
4
Does AI have applications?
  • Autonomous planning and scheduling of tasks
    aboard a spacecraft
  • Beating Gary Kasparov in a chess match
  • Steering a driver-less car
  • Understanding language
  • Robotic assistants in surgery
  • Monitoring trade in the stock market to see if
    insider trading is going on

5
Applications
6
Goals of AI
  • Problem solving
  • Problem-solving agents
  • In Artificial Intelligence, Search techniques are
    universal problem-solving methods. Rational
    agents or Problem-solving agents in AI mostly
    used these search strategies or algorithms to
    solve a specific problem and provide the best
    result.

6/34
7
An Agent
  • Anything that can gather information about its
    environment and take action based on that
    information.


8
Components of a Basic Speech Recognition System
A speech capturing Device It consists of a
microphone, which converts the sound wave signals
to electrical signals and an Analog to Digital
Converter which samples and digitizes the analog
signals to obtain the discrete data that the
computer can understand. A Digital Signal Module
or a Processor It performs processing on the raw
speech signal like frequency domain conversion,
restoring only the required information
etc. Preprocessed signal storage The
preprocessed speech is stored in the memory to
carry out further task of speech
recognition. Reference Speech patterns The
computer or the system consists of predefined
speech patterns or templates already stored in
the memory, to be used as the reference for
matching. Pattern matching algorithm The unknown
speech signal is compared with the reference
speech pattern to determine the actual words or
the pattern of words.
9
Working of the System
10
Working of the System
A speech can be seen as an acoustic waveform,
i.e., signal carrying message information. This
acoustic waveform is converted to analog
electrical signals by the microphone. The Analog
to Digital converter converts this analog signal
to digital samples by taking precise measurements
of the wave at discrete intervals. The digitized
signal consists of a stream of periodic signals
sampled at 16000 times per second and is not
suitable to carry out actual speech recognition
process as the pattern cannot be easily located.
To extract the actual information, the signal in
time domain is converted to signal in frequency
domain. This is done by the Digital Signal
Processor using FFT technique. In the digital
signal, the component after every 1/100th of a
second is analyzed and the frequency spectrum for
each such component is computed. In other words,
the digitized signal is segmented into small
parts of frequency amplitudes. Each segment or
the frequency graph represents the different
sounds made by human beings. The computer
performs the matching of the unknown segments
with the stored phonetics of the particular
language.
11
Factors on which Speech Recognition system
depends
  • The speech recognition system depends on the
    following factors
  • Isolated Words There needs to be a pause between
    the consecutive words spoken because continuous
    words can overlap making it difficult for the
    system to understand when a word starts or ends.
    Thus, there needs to be a silence between
    consecutive words.
  • Single Speaker Many speakers trying to give
    speech input at the same time can cause
    overlapping of the signals and interruptions.
    Most of the speech recognition systems used are
    speaker dependent systems.
  • Vocabulary size Languages with large vocabulary
    are difficult to be considered for pattern
    matching than those with small vocabulary as
    chances of having ambiguous words are lesser in
    the latter.

12
Components of ASR
LEXICON MODEL, ACOUSTIC MODEL, LANGUAGE MODEL
13
Lexicon The lexicon is the primary step in
decoding speech. Creating a comprehensive lexical
design for an ASR system involves including the
fundamental elements of both spoken language (the
audio input the ASR system receives) and written
vocabulary (the text the system sends
out). Acoustic Model Acoustic modeling involves
separating an audio signal into small time
frames. Acoustic models analyze each frame and
provide the probability of using different
phonemes in that section of audio. Simply put,
acoustic models aim to predict which sound is
spoken in each frame. Language Model Todays
ASR systems employ natural language processing
(NLP) to help computers understand the context of
what a speaker says. Language models recognize
the intent of spoken phrases and use that
knowledge to compose word sequences. They operate
in a similar way to acoustic models by using deep
neural networks trained on text data to estimate
the probability of which word comes next in a
phrase. Together, the lexicon, acoustic model,
and language model enable ASR systems to make
close-to-accurate predictions about the words and
sentences in an audio input.
14
How ASR Works?
  • In the simplest terms, speech recognition occurs
    when a computer receives audio input from a
    person speaking, processes that input by breaking
    down the various components of speech, and then
    transcribes that speech to text.
  • Some ASR systems are speaker-dependent and must
    be trained to recognize particular words and
    speech patterns. These are essentially the
    voice-recognition systems used in your smart
    devices. You need to say specific words and
    phrases into your phone before the ASR-powered
    voice assistant starts working in order for it to
    learn to identify your voice.
  • Other ASR systems are speaker-independent. These
    systems do not require any training.
    Speak-independent systems have the ability to
    recognize spoken words regardless of the speaker.
    Speaker-independent systems are practical
    solutions for business applications like
    interactive voice response (IVR).

15
ASR Use Cases
  • From speech recognitions mid-twentieth-century
    origins to its multi-industry applications today,
    the use cases for ASR technology are
    far-reaching. ASR made it out of the computer
    science laboratories and is now integrated into
    our everyday lives.
  • Voice Assistants According to a 2020 survey
    conducted by NPR and Edison Research, 63 of
    respondents said they use a voice assistant. The
    ability to use voice commands to help complete
    tasks like opening mobile apps, sending a text
    message, or searching the web affords users a
    greater level of convenience.
  • Language learning For people engaged in
    self-guided language study, apps using
    speech-recognition tools put them a step closer
    to having a comprehensive learning experience
    during independent study. Apps like Busuu and
    Babbel use ASR technology to help students
    practice their pronunciation and accents in their
    target languages. Using these apps, a student
    speaks into their phone or computer in their
    target language. The ASR software listens to that
    voice input, analyzes it, and if it matches what
    the system identifies as the correct
    pronunciation, it informs the learner. If the
    students voice input doesnt match what the ASR
    knows to be correct, it will inform the student
    of their missed pronunciation as well.
  • Transcription Services One of the first
    widespread use cases of ASR was for the simple
    transcription of speech. Speech-to-text services
    offer a level of convenience in many contexts and
    open the door to improved audio and video
    accessibility. Health care practitioners use
    dictation products like Dragon Naturally Speaking
    to help them take hands-free notes while
    attending to patients. ASR captioning also allows
    for real-time transcription of live video, which
    allows a broader audience to access the media.
  • Call Centers ASR is crucial for the automation
    of processes for businesses with extensive
    customer support demands. With an influx of
    callers, companies need a way to efficiently
    handle a vast amount of customer communication.
    ASR technology is one of the main mechanisms
    involved in smart IVR a system that automates
    routine inbound communications as well as
    large-scale outbound call campaigns.

16
Challenges Issues in ASR
  • Imprecision and false interpretations
  • Time and lack of efficiency
  • Accents and local differences
  • Background noise and loud environments
  • Privacy and data security

17
Aravali College of Engineering And
Management Jasana, Tigoan Road, Neharpar,
Faridabad, Delhi NCR Toll Free Number 91-
8527538785 Website  www.acem.edu.in
Write a Comment
User Comments (0)
About PowerShow.com