Title: Multimedia Data Processing
1Multimedia Data Processing
- MIDI and audio processing
- Compression encodings
- Generation and transmission of media stream
- Multimedia file formats
2Introduction to Multimedia
- It means two or more continuous media
- media that have to be played during some
well-defined time interval, usually with some
user interaction - The media can be
- text
- still image
- audio
- video
3Embedding Audio into Multimedia Programs
- Introduction
- Since few years multimedia has transformed from a
static page to a dynamic mix of sound, graphic,
animation, video and text - A full multimedia system includes all media , of
which sound and voice are 2 of the important
factors that should be researched and developed on
4Type of sound files
- Midi files
- Digital audio files
5Music Instrument Digital Interface (MIDI)
- A MIDI message conveys one musically significant
event - Typical events are
- a key being pressed
- foot pedal being released, etc.
- The status byte indicates the event, and the data
bytes give parameters, such as - which key was depressed and
- whit what velocity it was moved
- Shorthand representation of music stored in
numeric form. - Code of components (for instruments and
electronic sound) - Every instrument has a MIDI code assigned to it
- E.g. a grand piano is 0, a violin is 40
6MIDI (Cont.)
- The heart of every MIDI system is a syntheser (or
a computer) that - accepts messages
- generates music from them
- The syntheser understands all 127 instruments
- The advantage of transmitting music using MIDI
compared to sending a digitized waveform is the
enormous reduction in bandwidth, often by a
factor of 1000 - The disadvantage is that the receiver needs a
MIDI syntheser to reconstruct the music again,
and different ones may give slightly different
renditions - MIDI encoded files .midi, .mid, .rmi extensions
7Types of sound files
- MIDI
- Advantages
- Much more compact than digital audio files.
- Length of this kind of file can be changed
without degrading the audio quality. - MIDI can be manipulated in ways that are
impossible with digital audio. - Disadvantages
- Not digitized sound
- Not HIFI at all
- MIDI sound can not describe properly a real sound
8Types of sound files
- Digital audio
- Describe the actual representation of a sound
- Digital data Instantaneous amplitude of a sound
at discrete slices of time - A digitized sound is a sampled sound
- The more often sample is taken, the more data are
stored about this sample - Large data storage files required
9Types of sound files
- Digital audio
- Examples
- The highest sampling rate is 44.1 KHz, it is the
CD quality recording, the recognized standard of
audio quality - The lowest sampling rate is 5.5 KHz, it is like a
bad telephone connection
10Setting and recording level
- A distorted recorded sounds terrible.
- A signal too large unpleasant result.
- To avoid distortion dont cross over the limit
of the sound card. - If its happens low the volume.
11Editing digital recording
- Basic sound editing operations
- Trimming
- Involves removing dead air and any
unnecessary extra time of a recording. - Splicing and Assembly
- Remove the extraneous noises that are added into
a recording (cutting and pasting operations) - Volume adjustments
- With different recordings, different level of
volume. - To assemble them, it must exist a constant
volume. - It is necessary to select all the data in the
file, and raise or lower the overall volume
12Hardware required for recording voice
- Microphone sends impulses
- Sound card permits to sample this impulses
13Software for audio editing
- Professional sound editing software, can be
combined with sound card s to create, record and
edit audio files. - Tools and effects
- Program setting for voice recording.
- Volume.
14Software for audio editing
- Program setting for voice recording
- Noise gate
- DC offset
- Equalizer
15Software for audio editing
- Noise gate
- Removes signals below a set threshold (threshold
level). - Used to remove noise from silent break in a sound
file.
16Software for audio editing
- DC offset
- Used to change the base line of a sound.
- A recorded wave that is not centered around the
zero baseline in the waveform display is said to
have a DC offset. - To correct these offsets, a constant value is
added to each sample.
17Software for audio editing
- DC offset
- DC offsets are usually caused by electrical
mismatches between the sound card and microphone - Example If a DC offset of 100 exists, then a
-100 DC offset should be applied to cancel out
the existing offset
18Software for audio editing
- Equalizer
- An equalizer is needed to correct the input
signal. - Even when recording in low noise environment,
noise from equipment will cause a distortion in
the recording. - It needs to remove the noise from the computer
and electrical supply.
19Software for audio editing
- Equalizer
- 2 types of Equalizer
- Graphic EQ.
- Parametric Equalizer.
20Software for audio editing
- Equalizer
- Graphic EQ It divides all the possible
frequencies into 10 bands - It is also used to provide a clearer and cleaner
voice after recording
21Software for audio editing
- Equalizer
- Parametric Equalizer
- It uses filter, a filter removes some
frequencies. - Four basic types of parametric filter styles
high-pass, low-pass, band-pass and band-reject. - The more subtly a signal been filtered, the more
natural it sounds.
22Speech Processing
- Human speech frequency range 40 Hz ... 10 kHz
- In case of poorer quality requirements the range
40 Hz ... 4 kHz is enough - E.g. traditional plain telephone network
23Speech Transmission
- Pulse Code Modulation (PCM) - the simplest
- Linear Prediction Coding (LPC)
- based on speech synthesis
- the bit stream is a command sequence to the
speech syntheser - optimized for human speech not general voice
- The length of a human sound is 30...50 ms, since
generally the packets are 20 ms long, so loss of
one packet does not mean the loss of a whole
speech sound
24Video
- The human eye has the property that when an image
is flashed on the retina, it is retained for some
number of milliseconds before decaying - If a sequence of images is flashed at 50 or more
images/sec, the eye does not notice that it is
looking at discrete images - All movie/video systems exploit this principle to
produce moving pictures
25Analogue Video Formats
- NTSC
- America, Japan
- Resolution
- 525 line/frame
- 30 frames/sec
- PAL
- Europe, Australia
- Resolution
- 625 line/frame
- 25 frames/sec
- SECAM
- France, Italy
- It is not used for video conferencing
26Flicker and Interlacing
- While 25 frames/sec is enough to capture smooth
motion, at that frame rate many people,
especially older ones, will perceive the image to
flicker - Since the old image has faded off the retina
before the new one appears - Increasing the frame rate would require using
more bandwidth - The solution of the flicker problem instead of
displaying the scan lines in order, first all the
odd scan lines are displayed, then the even ones
are displayed - Each of these half frames is called a field
- Experiments show that although people notice
flicker at 25 frames/sec, they do not notice at
50 fields/sec - This technique is called interlacing
- noninterlaced television or video is said to be
progressive
27Lossless Compression Encodings
- Run Length Encoding (RLE)
- In many kinds of data, strings of repeated
symbols are common - This can be replaced by a special marker not
otherwise allowed in the data, followed by the
symbol comprising the run, followed by how many
times it occurred - If the special marker occurs in the data, it is
duplicated - Variable length encodings, e.g. statistical
encoding - A short code represents common symbols and longer
ones for infrequent ones - E.g., Huffman coding, Zip-Lempel algorithm
- Its applications
- Graphic Interchange Format (GIF)
- Portable Network Graphic (PNG)
28Lossy Compression Encodings
- Audio compression
- Pulse Code Modulation (PCM), G.711
- Sub-Band Coding (SBC)
- Global System for Mobile Communication (GSM)
- Video compression
- JPEG, Moving JPEG
- H.261, H.263
- MPEG
- Properties
- Differential encoding
- Motion compensation
- The next types are widely used in the Internet
(e.g.) - MPEG-1 Audio Layer III (MP3)
- MPEG-1 Video
29GIF
- The Graphics Interchange Format (GIF) was first
developed for image transfer among users of the
CompuServe online service - I serves two purposes
- its encoding is cross-platform
- it uses special compression technology that can
significantly reduce the size of the image file
for faster transfer over a network - GIF compression is lossless, too none of an
image's original data is altered or deleted, so
the uncompressed and decoded image exactly
matches its original
30The Two Versions of the GIF
- Even though GIF image files invariably have the
.gif (or .GIF) filename suffix, there actually
are two GIF versions - the original GIF87 and
- an expanded GIF89a
- it supports several new features including
transparent backgrounds - The browsers support both GIF versions, which use
the same encoding scheme that maps 8-bit pixel
values to a color table, for a maximum of 256
colors per image - Most GIF images have even fewer colors and there
are special tools to simplify the colors in more
elaborate graphics - By simplifying the GIF images, you create a
smaller color map and enhance pixel redundancy
for better file compression and consequent faster
downloading
31Limitations of the GIF
- Because of the limited number of colors, a
GIF-encoded image is not always appropriate,
particularly for photorealistic pictures - Rather, GIFs make excellent icons, reduced color
images, and drawings - Because the graphical browsers support the GIF
format, it is currently the most widely accepted
image-encoding format on the Web - It is acceptable for both inline images and
externally linked ones - When in doubt as to which image format to use,
choose GIF - It will work in almost any situation
32Interlacing I
- GIF images can be made to perform two special
tricks interlacing and transparency - Normally, a GIF encoded image is a sequence of
pixel data, in order row-by-row, from top to
bottom of the image - While the common GIF image renders onscreen like
pulling down a window shade, interlaced GIFs open
like a Venetian blind - With interlacing, a GIF image seemingly
materializes on the display, rather than
progressively flow onto it from top to bottom - That's because interlacing sequences every fourth
row of the image
33Interlacing II
- That way, users get to see a full image--top to
bottom, albeit fuzzy--in a quarter of the time it
takes to download and display the remainder of
the image - The resulting quarter-done image usually is clear
enough so that users with slow network
connections can evaluate whether or not to take
the time to download the remainder of the image
file - Interlaced GIFs appear first with poor resolution
and then improve in resolution until the entire
image has arrived, as opposed to arriving
linearly from the top row to the bottom row - This is great to get a quick idea of what the
entire image will look like while waiting for the
rest
34Interlacing III
- Not all graphical browsers, although able to
display an interlaced GIF, are actually able to
display the materializing effects of interlacing - With those that do, users still can defeat the
effect by choosing to delay image display until
after download and decoding - NCSA Mosaic, on the other hand, always downloads
and decodes images before display, so it doesn't
support the effect at all
35Transparency I
- The other popular effect available with GIF
images--GIF89a formatted images actually--is the
ability to make a portion of them transparent so
that what's underneath--usually the browser
window's background--shows through - Transparent GIFs are useful because they appear
to blend in smoothly with the user's display - They do this by assigning one color to be
transparent - The transparent GIF image has one color in its
color map designated as the background color - The browser simply ignores any pixel in the image
that uses that background color, thereby letting
the display window's background show through
36Transparency II
- By carefully cropping its dimensions and by using
a solid, contiguous background color, a
transparent image can be made to seamlessly meld
into a page's surrounding content or float above
it - Transparent GIF images are great for any graphic
you want to meld into the document and not stand
out as a rectangular block - Transparent GIF logos are very popular, as are
transparent icons and dingbats--any graphic that
should appear to have an arbitrary, natural shape - You may also insert a transparent image inline
with conventional text to act as a special
character glyph within conventional text
37Disadvantages of the Transparency
- The downside to transparency is that the GIF
image will look lousy if you don't remove its
border when included in a hyperlink anchor (ltagt
tag), or is otherwise specially framed - And, too, content flow happens around the image's
rectangular dimensions, not adjacent to its
apparent shape - That can lead to unnecessarily isolated images or
odd-looking sections in your HTML pages - Either and both GIF tricks--interlacing and
transparency--don't just happen you need special
software to prepare the GIF file.
38Making Transparent and Interlaced GIFs
- Transparent and interlaced GIFs can be made
through the web without running any utility
software on your own system through the
Visioneering image manipulation page
ltURLhttp//www.vrl.com/Imaginggt, which will
access your image through the web and produce an
enhanced version for you to save - A unique approach to the problem is offered by
Imagizer (URL is ltURLhttp//www.minet.com/minet/i
magizer.htmlgt), which transforms your images on
the fly when sending them to the user, supporting
thumbnails and TIFF-GIF conversion as well as
interlacing. - Of course, there is a tradeoff between storage
space and CPU usage - There are a lot of other utility for generating
such GIFs
39JPEG I
- The Joint Photographic Experts Group is a
standards body that developed what is now known
as the JPEG image-encoding format - Like GIFs, JPEG images are platform independent
and specially compressed for high-speed transfer
via digital communication technologies - Unlike GIF, JPEG supports tens of thousands of
colors for more detailed, photorealistic digital
images - JPEG uses special algorithms that yield much
higher data-compression ratios. It is not
uncommon, for example, for a 200 kilobyte GIF
image (which uses lossless compression) to be
reduced to a 30kilobyte JPEG image
40JPEG II
- To achieve that amazing compression, JPEG does
lose some image data - However, you can adjust the degree of
lossiness'' with special JPEG tools, so that
although the uncompressed image may not exactly
match the original, it will be close enough that
most people cannot tell the difference - The JPEG format is nearly universally understood
by today's graphical browsers. Some, most notably
Netscape, have a built-in JPEG decoder - Others, like Mosaic, invoke an external viewing
tool (helper application) for decoding and
displaying JPEG files, which invariably are
stored with a .jpg (or .JPG) filename suffix
41Comparison of JPEG and GIF
- JPEG is for photographic images
- GIF is for line-art images, such as icons, graphs
and line-art logos - JPEG produces smudgy line art and GIF produces
large and washed-out photographs - However, never convert GIF to JPEG if you can
possibly help it - Once your photograph has been reduced to the mere
256 colors supported by GIF, it's too late - Since JPEG is an approximate representation of
the image, you shouldn't save things as JPEG and
then edit them further later and save them again
42Comparison of JPEG and GIF (Cont.)
- You can expect progressive loss of quality each
time you do that, especially with different JPEG
quality settings - If you must edit a photographic image, work with
it in TIFF or PNG format until it is ready for
publication, then convert it to JPEG for the web - Go straight from a lossless 24-bit format
supported by your scanner, such as TIFF or PNG,
to JPEG - If a given image cannot tolerate being reduced to
8 bits for GIF or losing precise accuracy for
JPEG, TIFF and PNG are the best options
43The Progressive JPEG
- Progressive JPEG is a new variation on the JPEG
image format - It is like interlaced GIF in that it fade in
gradually instead of being drawn from top to
bottom - This enables the viewer to understand what the
image is about very quickly - Progressive JPEG is much smoother than
progressive GIF
44XBM
- The X BitMap image format is an ASCII
representation of a monochrome image - It is designed to capture small, black-and-white
icons in a form that can be directly included in
C and C programs - XBM format offers no compression features or
support for color images - It makes little sense to use XBM for any other
type of image the resulting XBM file would be
far larger than the equivalent GIF version of the
same image
45Basic Steps of the Packet Audio/Video Manipulation
- Digitalization of analogue signal to a serial bit
stream - Fragmenting the digitized bit stream into packets
- Transmission of packets through the network to
one or more receivers - Reconstruction of the packets at each receiver
for obtaining the original bit stream - Delivery of bits to the output audio/video codec
of the receiver
46Creation and Transmission of theMedia Stream
- Steps of Creation
- Sampling
- Quantitization
- Packetization
- Controlled transmission methods of the media
content - server-based
- server-less
47History of the Integrated Multimedia
- Embedding images (NCSA -Mosaic)
- Virtual reality applications (VRML)
- Applets on web page (in the browser Sun - Java)
48Embedding New Media
- Audio and video files - long wait
- Live videos - essentially without any wait, e.g.
- MS Windows Media technology
- Netscape - plug-in technology
- Vosaic
- Internet-based telephoning
49Services Provided by the Internet
- Best effort (from historical reasons)
- every data packet is handled by the same manner
in the network - The Internet itself does not guarantee any
quality of service - certain regions of the Internet are congested
- the success of the transmission depends on the
network conditions - poor transmission quality in certain places
- excellent service in other places
- no guaranty for delivery
- relative delays of each packet can vary
50The Main Parameters ofthe Quality of Service
(QoS)
- Delay
- Jitter
- Bandwidth
- Reliability
51Human-Recognizable Effect of the Delay Occurring
in Real-Time Applications
52Receiving Audio and Video by Using HTTP
(Serverless Streaming)
- HTTP is essentially a file retrieval protocol
- Due to the TCP (used by the HTTP) the whole media
file must be downloaded - A hyperlink is in a web page, which refers to a
remote audio file - Reading the file by a browser, the the audio file
is transferred by HTTP or FTP - In such a way the browser can download it
- These kind of systems have neither bandwidth
adaptation nor flow control
53The HTTP-Based Throughput
54Server-Based Streaming
- The A/V applications use special transport
protocols instead of TCP - In such a way the network problems processed by
the TCP must be solved in the applications - Instead of HTTP UDP-based protocols are used
- Streaming protocols start playing back real-time
(with a small delay) - These use strong compressions
- Requirements of streaming protocols
- Accepting packet loss - no retransmission
- Delay control (jitter - buffering)
- Dynamic throughput adaptation
55Server-Based Streaming
56Cooperation of Streaming Protocols with HTTP
- The transmission starts with HTTP connection set
up - The player directly connects to the A/V server by
using the stream protocol - Playing out is integrated (in user application
level) into the web - The HTTP server and the stream server are not
necessarily in the same hardware, since there are
two individual processes, the http serving and
the stream serving
57Cooperation of the a HTTP and Streaming Media
Server
58Traditional Media File Formats and Their
Extensions
59File Size and Download Time
- Originally the file formats were designed for
local access - large file sizes
- during download these are not able to played back
- SDB
- S - the file size
- D - recording time
- B - bit speed, number of bit to be stored for 1
mp A/V data - Response time (Download time ) S / T
- T - network throughput capability bit/sec
- In case of modem connection T 56 Kbit/s modem
connection
60Multimedia File Formats
- Older ones
- Vosaic
- StreamWorks
- Internet Wave
- VDOLive
- QuickTime
- RealMedia
- Windows Media
- Shockwave and Flash
- SHOUTCast
61Some Streaming Media File Extensions
62The RealAudio Streaming
- Connection setup (TCP)
- A client asks for a stream from the server
- negotiating the parameters
- Sending data (UDP)
- The server continuously send the encoded data
- The client decodes and play back the A/V signal
- Verification (TCP)
- Monitoring and evaluating the experienced
parameters
63RealAudio Encoding Standards
- RealAudio 2.0 - 14.4
- 14.4 Kbps, to 4.0 kHz
- RealAudio 3.0 - 28.8 Stereo
- 28.8 Kbps, to 4.0 kHz
- RealAudio 3.0 - ISDN Mono
- In ISDN line, to 11 kHz
- RealAudio 3.0 - Dual ISDN Stereo
- in two ISDN B channels, 16.0 kHz
64Delay Control of the RealAudio
- Buffering 4-5 sec
- In case of on-line phoning applications setting
the proper playout delay is more difficult task
than in case of video-on-demand, since in the
first case the possible buffer is smaller - unrecognizable delay less, than 150 ms
- acceptable delay 150 ms - 400 ms