Title: Media Types
1Media Types
- Text
- Image
- Graphics
- Audio
- Video
2Text
Representation
ASCII
ISO Character Sets
Marked-up Text
Structured Text
Hypertext
Operations
Character Operations
String Operations
Editing
Formatting
Pattern-matching searching
Sorting
Compression
Encryption
Language-specific operations
3Text - Representation
- ASCII
- 7-bit code
- 128 values in ASCII character set
- use of 8th bit in text editors/word processors
creates incompatibility - ISO character sets
- extended ASCII to support non-English text
- ISO Latin provides support for accented
characters - à, ö, ø, etc.
- ISO sets include Chinese, Japanese, Korean
Arabic - UNICODE
- 16 bit format
- 32768 different symbols
4Text - Representation
- Marked-up text
- nroff, troff
- LaTEX
- SGML
- HTML
- HyTime
- XML, XSL, XLL
- Structured Text
- structure of text represented in data structure,
usually tree-based - ODA, structure embedded in byte-stream with
content - Hypertext
- non-linear
- graph or web structure nodes and links
- currently subject of intensive ISO standards
activity
5Text - Operations
- Character operations
- basic data type with assigned value
- permits direct character comparison (altb)
- String operations
- comparison
- concatenation
- substring extraction and manipulation
- Editing
- perhaps the most familiar set of operations on
text - cut/copy/paste
- strings v. blocks, dependent on document structure
6Text - Operations
- Formatting
- interactive or non-interactive (WYSIWYG v. LaTEX)
- formatted output
- bitmap
- page description language (Postscript, PDF)
- font management
- typeface
- point size (1 point 1/72 of an inch)
- TrueType fonts geometric description kerning
- Pattern-matching and Searching
- search and replace
- wildcards
- regular expressions
- for large bodies of text, or text databases, use
of inverted indices, hashing techniques and
clustering.
7Text - Operations
- Sorting
- numerous varieties of sort, all of them
extensively studied in basic programming - sort complexity is a major factor in data
handling performance - Compression
- ASCII uses 7 bits per character, though most
word-processors actually use the 8th bit to use
up a byte per character - Information theory estimates 1-2 bits per
character to be sufficient for natural language
text - This redundancy can be removed by encoding
- Huffman varies the numbers of bits used to
represent characters, shortest codes for highest
frequency characters - Lempel-Ziv identifies repeating strings and
replaces them by pointers to a table - Both techniques compress English text at a ratio
of between 21 and 31
8Text - Operations
- Encryption
- text encryption is widely used in electronic mail
and networked information systems - most widely-used techniques
- DES
- RSA public-key
- PGP
- subject of major controversy
- key escrow systems
- Clipper chip
- strong encryption now being legally outlawed in
a number of countries - Language-specific operations
- spell-checking
- parsing and grammar checking
- style analysis
9Image
Representation
Colour Model
Alpha Channels
Number of Channels
Channel Depth
Interlacing
Indexing
Pixel Aspect Ratio
Compression
Operations
Editing
Point operations
Filtering
Compositing
Geometric transformations
Conversion
10Image - Representation
- Colour Model
- 2 main types
- colour production on output device
- theory of human colour perception
- CIE colour space
- international standard used to calibrate other
colour models - developed in 1931, as CIE XYZ, based on
tristimulus theory of colour specification
11Image - Representation
- RGB
- numeric triple specifying red, green and blue
intensities - convenient for video display drivers since
numbers can be easily mapped to voltages for RGB
guns in colour CRTs - HSB
- Hue - dominant colour of sample, angular value
varying from red to green to blue at 120
intervals - Saturation - the intensity of the colour
- Brightness - the amount of gray in the colour
- CMYK
- displays emit light, so produce colours by adding
red, green and blue intensities - paper reflects light, so to produce a colour on
paper one uses inks that subtract all colours
other than the one desired - printers use inks corresponding to the
subtractive primaries, cyan, magenta and yellow
(complements of RGB)
12Image - Representation
- additionally, since inks are not pure, a special
black ink is used to give better blacks and grays - YUV
- colour model used in the television industry
- also YIQ, YCbCr, and YPbPr
- Y represents luminance, effectively the
black-and-white portion of a video signal - UV are colour difference signals, form the colour
portion of a video signal, and are called
chrominance or chroma - YUV makes efficient use of bandwidth as the human
eye has greater sensitivity to changes in
luminance than chrominance, so bandwidth can be
better utilised by allocating more to luminance
and less to chrominance - Alpha Channels
- images may have one or more alpha channels
defining regions of full or partial transparency
13Image - Representation
- can be used to store selections and to create
masks and blends - Number of channels
- the number of pieces of information associated
with each pixel - usually the dimensionality of the colour model
plus the number of alpha channels - Channel depth
- number of bits-per-pixel used to encode the
channel values - commonly 1,2,4 or 8 bits, less commonly 5,6,12 or
16bits - in a multiple channel image, different channels
can have different depths - Interlacing
- storage layout of a multiple channel image could
separate channel values (all R values, followed
by all G, followed by all B) or could use
interlacing (all RGB for pixel 1, all RGB for
pixel 2.........)
14Image - Representation
- Indexing
- pixel colours can be represented by an index in a
colour map or a colour lookup table (CLUT) - Pixel aspect ratio
- ratio of pixel width to height
- square pixels are simple to process, but some
displays and scanners work with rectangular
pixels - if the pixel aspect ratios of an image and a
display differ the image will appear stretched or
squeezed - Compression
- a page-sized 24-bit colour image produced by a
scanner at 300dpi takes up about 20 Mbytes - many image formats compress pixel data, using
run-length coding, LZW, predictive coding and
transform coding - many image formats JPEG, GIF, TIFF, BMP most
widely used
15Image - Operations
- These operations can operate directly on pixel
data or on higher-level features such as edges,
surfaces and volumes - Operations on higher-level features fall into the
domain of image analysis and understanding and
will not be considered here - Editing
- changing individual pixels for image touch-up,
forms the basis of airbrushing and texturing - cutting, copying and pasting are supported for
groups of pixels, from simple shape manipulation
through to more complex foreground and background
masking and blending - Point operations
- consists of applying a function to every pixel in
an image
16Image - Operations
- only uses the pixels current value, neighbouring
pixels cannot be used - Thresholding
- a pixel is set to 1 or 0 depending on whether it
is above or below a threshold value - creates
binary images which are often used as masks when
compositing - Colour Correction
- modifying the image to increase or reduce
contrast, brightness, gamma effects, or to
strengthen or weaken particular colours - Filtering
- like point operations, operate on every pixel in
an image, but use values of neighbouring pixels
as well - used to blur, sharpen or distort images,
producing a variety of special effects
17Image - Operations
- Compositing
- the combining of two or more images to produce a
new image - generally done by specifying mathematical
relationships between the images - Geometric Transformations
- basic transformations involve displacing,
rotating, mirroring or scaling an image - more advanced transformations involve skewing and
warping images - Conversions
- conversions between image formats are commonplace
and a number of p.d, shareware and commercial
tools exist to support these - other forms of conversion include compression and
decompression, changing colour models, and
changing image depth and resolution
18Representation
Geometric Models
Solid Models
Physically-based Models
Empirical Models
Drawing Models
External formats for Models
Operations
Primitive Editing
Structural Editing
Shading
Mapping
Lighting
Viewing
Rendering
19Graphics - Representation
- The central notion of graphics, as opposed to
image data, is in the rendering of graphical data
to produce an image. A graphics type or model is
therefore the combination of a data type plus a
rendering operation - Graphics Representation
- Please note - object in graphics modelling
usually refers to an element of the scene being
modelled, unless you are using object-oriented
graphics programming - Geometric Models
- consist of 2D and/or 3D geometric primitives
- 2D primitives include lines, rectangles, ellipses
plus more general polygons and curves - 3D primitives include the above plus surfaces of
various forms. Curves and curved surfaces
described by parameterised polynomials
20Graphics - Representation
- primitives are first described in local or object
co-ordinates, then arranged in groups in a common
world co-ordinate system by applying modelling
transformations - transformations include rotation, translation and
scaling - primitives can be used to build structural
hierarchies, allowing each structure thus created
to be broken down into lower-level structures and
primitives (i.e. blueprinting) - Several standard device-independent graphics
libraries are based on geometric modelling - GKS (Graphic Kernel System(ISO))
- PHIGS (Programmers Hierarchical Interactive
Graphic System (ISO)) - see also PHIGS and PEX - OpenGL - portable version of Silicon Graphics
library - Solid Models
- Constructive Solid Geometry (CSG) solid objects
are combined using the set operators union,
intersection and difference.
21Graphics - Representation
- Surfaces of revolution a solid is formed by
rotating a 2D curve about an axis in 3D space -
lathing - Extrusion a 2D outline is extended in 3D space
along an arbitrary path - Using the above techniques will produce models
much faster than building them up from geometric
primitives, but rendering them will be expensive - Physically-based Models
- realistic images can be produced by modelling the
forces, stresses and strains on objects - when one deformable object hits another, the
resulting shape change can be numerically
determined from their physical properties - Empirical Models
- complex natural phenomena (clouds, waves, fire,
etc.) are difficult to describe realistically
using geometric or solid modelling
22Graphics - Representation
- while physically based models are possible, they
may be computationally expensive or intractable - the alternative is to develop models based on
observation rather than physical laws, such
models do not embody the underlying physical
processes that cause these phenomena but they do
produce realistic images - fractals, probabilistic graph grammars (used for
branching plant structures) and particle
systems(used for fires and explosions) are
examples of empirical models - Drawing Models
- describing an object in terms of drawing or
painting actions - the description can be seen as a sequence of
commands to an imaginary drawing device -
Postscript, LOGO turtle graphics - External formats for Models
- need for export/import formats between graphics
packages - CGM CAD are OK. Postscript and RIB are
render-only
23Graphics - Operations
- Primitive editing
- specifying and modifying the parameters
associated with the model primitives - e.g. specify the type of a primitive and the
vertex coordinates and surface normals - Structural editing
- creating and modifying collections of primitives
- establish spatial relationships between members
of collections - Shading
- the modelling techniques described so far have
provided the means to specify the shape of
objects, but shading provides further information
for the image in describing the interaction of
light with the object. This interaction is
described in terms of the colour of an object,
how it reflects light and if it transmits light
24Graphics - Operations
- several general-purpose methods exist to describe
shading, most initially describe the surface of
the object using meshes of small, polygonal
surface patches - flat shading - each patch is given a constant
colour - Gouraud shading - colour information is
interpolated across a patch - Phong shading - surface normal information is
interpolated across a patch - Ray tracing Radiosity - physical models of
light behaviour are used to calculate colour
information for each patch, giving highly
realistic results - for photorealistic images extremely flexible
shading is required, tools such as RenderMan
actually provide programmable shaders which can
be attached to objects, simulating different
light effects and surface normals. - Mapping
- techniques for enhancing the visual appearance of
objects
25Graphics - Operations
- Texture mapping
- an image, the texture map, is applied to a
surface - requires a mapping from 3D surface coordinates to
2D image coordinates, so given a point on the
surface the image is sampled and the resulting
value used to colour the surface at that point - shaders can also provide solid textures, where
the texture is obtained from 3D rather than 2D
space, and procedural textures, where the texture
is calculated rather than sampled - Bump mapping
- as texture mapping, but used to change the vector
of the surface rather than the colour - used to describe minor surface changes such as
scratches or scrapes - Displacement mapping
- local modifications to the position of a surface
- produces ridges or grooves
26Graphics - Operations
- Environment mapping
- also known as reflection mapping, used to handle
limited forms of reflection - more primitive technique than ray-tracing
- Shadow mapping
- similar to environment mapping in that it
provides a primitive lighting effect without the
expense of ray-tracing - produces shadows
- Lighting
- within a model, in addition to the graphics
objects, there are lights to illuminate the
scene. There are various forms of light source,
each of which can be parametrically specified - ambient light - background lighting, comes from
all directions with equal intensity - point lights - come from specific points in
space, intensity governed by inverse square law
27Graphics - Operations
- directional lights - located at infinity in some
direction, intensity is constant - spot lights - illuminating a cone-shaped volume
- Viewing
- to produce an image of a 3D model we require a
transformation which projects 3D world
coordinates onto 2D image coordinates - transformation applied to viewing volume, that
part of the model that appears in the image - view specification consists of selecting the
projection transformation, usually from parallel
or perspective projections although camera
attributes can be specified in some renderers,
and the view volume - Rendering
- rendering converts a model, including shading,
lighting and viewing information, into an image - software allows selection and fine-tuning of
control parameters
28Graphics - Operations
- output resolution - the width and height of the
output image in pixels, and the pixel depth - rendering time - quick and low-quality v. slow
and high resolution
29Digital Video
Representation
Analog formats sampled
Sampling rate
Sample size and quantisation
Data rate
Frame rate
Compression
Support for interactivity
Scalability
Operations
Storage
Retrieval
Synchronisation
Editing
Mixing
Conversion
30Digital Video - Representation
- Analog formats sampled
- Digital video frames can obtained in two ways
- Synthesis - usually by a computer program
- Sampling - of an analog video signal. Since
analog video comes in various different flavours,
according to frame rate, scan rate, composite v
component, sampling rate and size vary.
31Digital Video - Representation
- Sampling rate
- the value of the sampling rate determines the
storage requirement and data transfer rate - the lower limit for the frequency at which to
sample in order to faithfully reproduce the
signal, the Nyquist rate, is twice the highest
frequency within the signal - video processing is simplified if each frame and
each scan line give rise to the same number of
samples, requiring the sampling frequency to be
an integer multiple of the scan rate - Sample size and quantisation
- sample size is the number of bits used to
represent sample values - quantisation refers to the mapping from the
continuous range of the analog signal to discrete
sample values - choice of sample size is based on
- signal to noise ratio of sampled signal
- sensitivity of medium used to display frames
32Digital Video - Representation
- sensitivity of the human eye
- digital video commonly uses linear quantisation,
where quantisation levels are evenly distributed
over the analog range (as opposed to logarithmic
quantisation) - Data rate
- high data rate formats can be reduced to lower
data rates by a combination of - compression
- reducing horizontal and vertical resolution
- reducing the frame rate
- for example
- start with broadcast quality digital video at
10Mbytes/s - divide the horizontal and vertical resolutions by
2, giving VHS quality resolution - divide the frame rate by 2
- compress at a rate of 101
- data rate becomes 1Mbit/s, suitable for use on
LANs and on optical storage devices (i.e. CD-ROM)
33Digital Video - Representation
- Frame rate
- 25 or 30 fps equates to analog frame rate, or
full-motion video - at 10-15 fps motion is less accurately depicted
and the image flickers, but the data rate is much
reduced - Compression
- we have already considered compression
techniques, in digital video we can compare
methods by three factors - Lossy v. lossless
- Real-time compression - trade-off between
symmetric models and asymmetric models with
real-time decompression - Interframe (relative) v. Intraframe (absolute)
compression (i.e. MPEG-1 v. Motion JPEG) - Support for interactivity
- random access to frames
- differential rate and reverse playback
- cut and paste capability
34Digital Video - Representation
- Scalability
- scalable video allows control over video quality,
we can identify 2 forms - Transmit scalability - encoded data rate is
chosen at compression time from a range of rates,
governed by transmission and processing
constraints and/or storage capacity. Currently in
use for low rate digital video - Receive scalability - decoded data rate is chosen
at decompression time to match playback
requirements. Attractive concept but not yet
available in current video coding standards - current approaches to low rate digital video
include - DVI (Digital Video Interactive) - two forms,
Production Level Video (PLV) and Real-Time Video
(RTV). PLV only really intended for playback, RTV
produces poorer quality but is intended for
compression. Both use interframe compression to
achieve rates of 1Mbit/s, but require costly
hardware. - MPEG-1 - 1Mbit/s
35Digital Video - Representation
- MPEG-2 - broadcast quality video at rates between
2-15Mbit/s - MPEG-4 - low data rate video
- MPEG-7 - metadata standard for video
representation - Motion JPEG
- px64 (CCITT H.261) - intended for video
applications using ISDN (Integrated Services
Digital Network). Known as px64 since it produces
rates that are multiples of ISDNs 64Kbits/s B
channel rate. Uses similar techniques to MPEG
but, since compressions and decompression must be
real-time, quality tends to be poorer. - H.263 - based on H.261, but offers 2.5 times
greater compression, uses MPEG-1 and MPEG-2
techniques.
36Digital Video - Operations
- Storage
- to record or playback digital video in real-time,
the storage system must be capable of sustaining
data transfer at the video data rate - 4 main forms of storage for digital video are
- Magnetic tape - at present only magnetic tape can
provide the vary high capacity storage required
for digital video at practical costs ( 1 hour of
CCIR 601 422 uses 72 Gbytes, while 1 hour of
digital HDTV requires nearly 1 Tbyte) - Special purpose magnetic storage systems - useful
for short durations of high data rate digital
video, can be connected direct to external
equipment and are thus useful for capture and
editing (see diagram) - Video memory boards - specialist boards with
large amounts of semiconductor memory (several
hundred Mbytes or more), capable of storing short
durations of uncompressed digital video, useful
for capture and editing.
37Digital Video - Operations
- General purpose magnetic and optical storage
systems - most low data rate video
representations (MPEG, etc.) were designed to
support the use of conventional storage media for
real-time video playback. Problem is size of
storage, even using MPEG-1 13 minutes of video
will fill a 100Mbyte disk. - Retrieval
- uses frame addressing, as in analog video, but
there are some problems - low data rate formats result in variable sized
frames, so an index giving frame offsets needs to
be maintained to support random access - interframe compression techniques, i.e. MPEG,
only code key frames independently, other frames
are derived from these key frames. So random
access requires to first find the nearest key
frame and then use this to decode the desired
frame, again using the index but enhancing it
with key frame locations
38Digital Video - Operations
- Synchronisation
- suffers same problems as analog video, so uses
same techniques - digital video also has some additional techniques
not available in analog video, such as changing
resolution to maintain frame rate - Editing
- 2 types
- tape-based - same procedures as with analog
video, except no generation loss and the players
are on the same machine - nonlinear - basically a clips-library, using cut
and paste techniques to build a video sequence - Mixing
- real-time effects, such as tumbles, wipes and
fades, are calculated in the same way as for
analog video, in fact for the majority of such
effects whether the original source is analog or
digital, the effects are digitised
39Digital Video - Operations
- non-real-time effects are only possible using
digital video, and obviate the need for
specialist equipment, being only dependent on the
speed of the processor and the patience of the
user, storage considerations can be overcome with
the use of pointers and single frame editing - Conversion
- variety of formats demands conversion formats
- real-time conversion requires specialist hardware
- compression/decompression within a single format
also requires specialist software/hardware
40Digital Audio
Representation
Sampling frequency
Sample size and quantisation
Number of channels (tracks)
Interleaving
Negative samples
Encoding
Operations
Storage
Retrieval
Editing
Effects and filtering
Conversion
41Digital Audio - Representation
- Digital Audio Representation
- 2 main areas
- telecommunications
- entertainment (audio CD)
- Produced by sampling a continuous signal
generated by a sound source. An analog-to-digital
converter (ADC) takes as input an electrical
signal corresponding to the sound and converts it
into a digital data stream. The reverse process,
to generate the sound through an amplifier and
speakers, involves a digital-to-analog converter
(DAC) - Sampling frequency (rate)
- sampling theory shows that a signal can be
reproduced without error from a set of samples,
providing the sampling frequency is at least
twice the highest frequency present in the
original signal
42Digital Audio - Representation
- telephone networks allocate a 3.4kHz bandwidth to
voice-grade lines, thus a sampling rate of 8kHz
is used for digital telecommunications - the human ear is sensitive to frequencies of up
to about 20kHz, so to digitise any perceivable
sound a sampling rate of over 40kHz is required - Sample size and quantisation
- during sampling, the continuously varying
amplitude of the analog signal is approximated by
digital values, this introduces a quantisation
error, being the difference between the actual
amplitude and the digital approximation - quantisation error is apparent when the signal is
reconverted to analog form as distortion, a loss
in audio quality - quantisation error can be reduced by increasing
the sample size, as allowing more bits per sample
will improve the accuracy of the approximation
43Digital Audio - Representation
- quantisation refers to breaking the continuous
range of the analog signal into a number of
unique digital intervals, based on one of a
number of schemes - linear quantisation - uses equally spaced
intervals, so if the sample size is 3 bits and
the maximum signal variation is 5.0 then the
quantisation interval would be 0.625 units of
signal amplitude - nonlinear quantisation (especially logarithmic
quantisation) - uses non-equally spaced
intervals, lower amplitude intervals are more
closely spaced than higher amplitude, results in
greater sensitivity to lower amplitude sound
where the human ear is most sensitive - Number of channels (tracks)
- speech quality audio is mono (1 track)
- stereo audio requires 2 tracks
- some consumer audio equipment use 4 tracks
(quadrophonic) - professional audio equipment uses 16, 32 or more
44Digital Audio - Representation
- Interleaving
- a multi-channel audio value can be encoded by
interleaving channel samples or by providing
separate streams for each channel - the advantage of interleaving is in
synchronisation, and it also offers some benefits
in storage and transmission - the disadvantages of interleaving are that it can
be wasteful of space or bandwidth if not all
channels are needed, it freezes the
synchronisation between channels thus preventing
temporal shifts, and it may not allow variation
in the number of channels - Negative samples
- the voltages found in analog audio signals
alternate between positive and negative values - negative values can be encoded successfully for
processing in twos complement, ones complement or
sign-magnitude representation
45Digital Audio - Representation
- Encoding
- encoding audio data reduces storage and
transmission costs, and compressed audio also
provides better quality when compared to
uncompressed audio at the same data rate - 2 commonly-used methods
- PCM (Pulse Code Modulation) - uses the fact that
a digital signal can be formed from a series of
pulses. PCM values are simply sequences of
uncompressed samples, so they provide a reference
format for comparison with more complex coding
methods - ADPCM (Adaptive Delta Pulse Code Modulation) -
reduces PCM data rate by encoding the differences
between samples. ADPCM is widely used and is
associated with some encoding standards, such as
CCITT G.721.
46Digital Audio - Operations
- Storage
- it is possible to record digital audio, even at
the data rates of the high quality formats, on
general purpose magnetic storage - theoretically, a magnetic disk with a sustainable
transfer rate of 5 Mbytes per second could
playback 50 channels of CD-quality digital audio.
In practice this would not be possible without a
highly optimised layout, but one or two channels
are easily within the reach of small computer
systems - since an hour of stereo digital audio, at the CD
data rate, requires over half a Gigabyte of
storage, tertiary storage in the form of DAT
tapes, CD discs or optical disks is normally
adopted, with the information being mounted onto
the system manually or through a jukebox - Retrieval
- need to support random access and ensure
continuous flow of data to DAC
47Digital Audio - Operations
- portions of audio sequences, segments, are
identified by their starting time and duration,
these can be located is by mapping the starting
time to a segment address, which the file system
then maps to a physical address on disk - where there is no direct mapping to enable
segment location by time code, an index of
segments must be separately maintained - continuous flow of data is easy to maintain with
a dedicated storage system, but requires careful
control where storage is scheduled for a number
of such tasks - Editing
- as with digital video, 2 types
- tape-based
- disk-based
- to avoid audible clicks when inserting one sample
into another, cross-fades are used, where the
amplitudes of the original segment and the
inserted segment are added and scaled about the
insertion point
48Digital Audio - Operations
- digital audio also supports non-destructive
editing, where the segments of data are accessed
through a data structure known as a play-list,
which essentially contains a set of pointers to
the data and details on ordering and other forms
of edit to be performed on the data when it is
joined - Effects and filtering
- digital filtering techniques permit a number of
effects on audio - Delay
- Equalisation Normalisation
- Noise reduction Time compression and expansion
- Pitch shifting
- Stereoisation
- Acoustic environments
- Conversion
- one format to another (uncompressing ADPCM-gtPCM)
- altering encoding parameters (i.e. resampling at
lower frequency)
49Music
Representation
Operational v. Symbolic
MIDI
SMDL
Operations
Playback Synthesis
Timing
Editing Composition
50Music - Representation
- The existence of powerful, low-cost, digital
signal processors mean that many computers can
now record, generate and process music. - Music is also widely used in multimedia
applications, so we require a media type for
music to focus on the computers musical
capabilities. - Representation of Music
- Operational v. Symbolic
- operational representations specify exact timings
for music and physical descriptions of the sounds
to be produced - symbolic representations use descriptive
symbolism to describe the form of the music and
allow great freedom in the interpretation - both types are described as structural
representations, since instead of representing
music by audio samples there is information about
the internal structure of the music
51Music - Representation
- The existence of powerful, low-cost, digital
signal processors mean that many computers can
now record, generate and process music. - Music is also widely used in multimedia
applications, so we require a media type for
music to focus on the computers musical
capabilities. - Representation of Music
- Operational v. Symbolic
- operational representations specify exact timings
for music and physical descriptions of the sounds
to be produced - symbolic representations use descriptive
symbolism to describe the form of the music and
allow great freedom in the interpretation - both types are described as structural
representations, since instead of representing
music by audio samples there is information about
the internal structure of the music
52Music - Representation
- The existence of powerful, low-cost, digital
signal processors mean that many computers can
now record, generate and process music. - Music is also widely used in multimedia
applications, so we require a media type for
music to focus on the computers musical
capabilities. - Representation of Music
- Operational v. Symbolic
- operational representations specify exact timings
for music and physical descriptions of the sounds
to be produced - symbolic representations use descriptive
symbolism to describe the form of the music and
allow great freedom in the interpretation - both types are described as structural
representations, since instead of representing
music by audio samples there is information about
the internal structure of the music
53Music - Representation
- To illustrate the structural representations, we
can consider two - MIDI - a widely use protocol allowing the
connection of computers and musical equipment, an
operational representation - SMDL - a proposal for a standard structure for
documents containing musical information, having
both operational and symbolic aspects - MIDI
- the Musical Instrument Digital Interface was
developed in the early 80s by musical equipment
makers - Devices
- electronic keyboards and synthesisers
- drum machines
- sequencers (to record and play back MIDI
messages) - musiclt-gtfilm and musiclt-gtvideo synchronisation
equipment
54Music - Representation
- Connection ports
- MIDI OUT - allows a device to send MIDI messages
it has produced to other MIDI devices - MIDI IN - receives MIDI messages from other MIDI
devices - MIDI THRU - repeats received messages, permitting
daisy-chaining of MIDI devices - MIDI devices process MIDI messages differently,
according to their function or to the sound
palette used by the device, hence different
synthesisers can produce different sounds
supplied with the same MIDI messages - MIDI Concepts
- Channel - a MIDI connection has 16 message
channels, devices can be set to respond to all
channels or only to specific channels - Key number - notes are identified by key number,
128 compared with a standard keyboard of 88 - Controller - 128 different controllers are
available under the MIDI protocol, though not all
are currently defined, changing the value of a
controller typically alters sound production
55Music - Representation
- Patch/program - an audio palette is called a
program or patch, a synthesiser capable of having
a number of patches active at the same time is
called multi-timbral - Polyphony - the ability of a synthesiser to play
many notes at a time - Song - a recorded or preprogrammed MIDI sequence
- Timing clock - a MIDI sequencer timestamps
messages using a timebase measured in parts per
quarter note (PPQ). Typical timebase values are
24, 96 and 480 PPQ. To convert the timebase into
actual time you use the tempo, measured in beats
per minute (BPM) where we assume that one beat is
equal to a quarter note. Thus if we have a tempo
of 180 BPM, a time base of 96PPQ 1/3 x 1/96
3.47ms - MIDI synchronisation - MIDI devices can be set to
internal synch or external synch, when set to
internal synch a device is known as a master and
produces a timing clock message on its MIDI OUT
at 24PPQ which slave devices use for external
synch - MTC - MIDI Time Code is used to synchronise MIDI
with film or video, used to trigger sound effects
or musical sequences
56Music - Representation
- MIDI Protocol
- based on 8-bit code for messages, each message
consists of a single command byte and possibly
one or more data bytes (see table) - Channel voice messages (8c-Ec) - determine the
actual notes played, speed of hit and release and
the values of controllers - Channel mode messages (Bc, with controllers
121-127) - selects the mode of a synthesiser,
responding to one channel or all channels, each
channel separately voiced or all voices used for
one channel - System messages (F0-FF) - general system
functions, timing clock, MIDI time code messages,
system reset, start device, stop device, etc. - Limitations of MIDI
- operates at 31250bps, allows 500 notes per second
which may not be enough for complex pieces - limited number of channels, lack of device
addressing and other flaws make configuring large
MIDI networks difficult - device dependence of MIDI data
57Music - Representation
- SMDL
- the Standard Music Description Language was
developed by the MIPS committee of ANSI - SMDL encompasses representation of music for
electronic dissemination and production by
software, the representation of scores and
musical examples in printed documents and the
representation of musical annotation and
attributes used for musical analysis or by music
databases - SMDL is a DTD of SGML, based on a document type
called musical works or works. Each work has 4
hierarchically structured sections - core section - musical events, such as note
sequences, which form the work - gestural section - performances of the core,
which may differ in interpretation - visual section - displays the core in printed,
includes formatting and lyrics - analytical section - allows a number of
theoretical analyses on the core, its score and
performances to be included in the work
58Music - Operations
- In considering music representation, we can
recognise several advantages over audio - music representation will be more compact than
audio - it is portable and can be synthesised with the
fidelity and complexity appropriate to the output
devices used - while digital audio suffers from inherent noise,
musical representations are noise free - many operations can be performed on music that
would be infeasible or require extensive
processing on audio - Playback Synthesis
- during audio playback, the listener has limited
influence over the musical aspects of the
performance, beyond changing the volume or
processing the audio in some way. If music is
produced by synthesis from a structural
representation the listener can
59Music - Operations
- independently change pitch and tempo, increase
or decrease individual instruments volumes or
change the sounds they produce - musical representations offer greater potential
for interactivity than audio - Timing
- structural representation makes timing of musical
events explicit - the ability to modify tempo makes it possible to
alter the timing of groups of musical events and
adjust the synchronisation of those events with
other events (film, video, etc.) - Editing Composition
- basic editing allows the user to modify primitive
events and notes - more complex editing operations operate on
musical aggregates (chords, bars, etc.) to permit
phrase-repetition, melody replacement and other
such functions - composition software simplifies the task of
generating and combining or rearranging tracks,
and prints the score
60Animation
Representation
Cel models
Scene-based models
Event-based models
Key frames
Articulated objects hierarchical models
Scripting procedural models
Physically-based empirical models
Operations
Graphics operations
Motion parameter control
Rendering
Playback
61Animation - Representation
- Separating animation and video follows the same
track we took in separating image and graphic,
based on modelling. - Animation types provide models which are rendered
to produce video. - Animation is distinct from graphic in that it is
time-dependent, but as in the imagelt-gtvideo
relationship, sampling an animation model at a
particular time will result in a graphics model,
which can be rendered to produce an image - Animation Representation
- Cel models
- early animators drew on transparent celluloid
sheets or cels, different sheets contained
different parts of the scene, which was assembled
by overlaying the sheets - in animation, cels are digital images with a
transparency channel
62Animation - Representation
- scenes are rendered by drawing the cels back to
front, with movement being added by changing the
position of cels from one frame to the next - a cel model is therefore a set of images, their
back to front order, and their relative position
and orientation in each frame - Scene-based models
- simply a sequence of graphics models, each
representing a complete scene