Audio Databases - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Audio Databases

Description:

m: adjacent notes from melody feature string. P(xi):the integer ... This index structure consists of a melody and a rhythm suffix tree with links pointing. ... – PowerPoint PPT presentation

Number of Views:1084
Avg rating:3.0/5.0
Slides: 74
Provided by: yllo
Category:

less

Transcript and Presenter's Notes

Title: Audio Databases


1
Audio Databases
2
Metadata
  • Using metadata to represent audio content is done
    in a very similar way as we did for video.
  • The metadata used to represent audio content may
    be viewed as a set of objects spread out cover a
    time line.
  • We may index the metadata associated with audio
    in exactly the same way as we indexed video, and
    the same query-processing techniques may be used
    over again.

3
Example
  • The following figure shows the line segments
    associated with part of an opera.
  • Activity1 may be Act 1 of the opera, activity2
    may be Act 1, Scene 1, and so on.

4
Example (conti.)
  • Each activity may have an associated set of
    fields.
  • Singers It may be a set valued field containing
    records having a Role, SingerType and SingerName.
    If the triple (Lohengrin, Tenor, Rene Kollo)
    appears in the segment 50, 100), Rene Kollo, a
    tenor, is singing the role of Lohengrin during
    the time segment 50, 100) of the opera.
  • Score It may be a field of type music_doc which
    points to a relevant part of the music score
    associated with the time segment 50, 100).
  • Transcript It may be a field of type document
    that points to the relevant part of the libretto
    during the time segment 50, 100).

5
Signal-Based Audio Content
  • In some applications, creation of metadata is
    somewhat complex, speaker unknown or content
    unclear.
  • Audio data is considered as a signal, ?(x), over
    time x.
  • Different features of the signal ? are extracted,
    indexed and stored for efficient retrieval.
  • Metadata may still be used to complement the
    signal data.

6
Sample Audio Signals
7
Signal
  • Period of vibration, T time taken for a
    particle in the wave to return to its starting
    position, ex. from point A to point B.
  • Frequency of vibration, f number of vibrations
    per second. f 1/T.
  • Velocity, v the speed of the crests and troughs
    move to the right. v w/T w ? f, where w
    denotes the wavelength of the wave.
  • Amplitude, a the maximum intensity of the
    signal associated with the wave.

8
Indexing by Segmentation
  • Split up the audio signal into relatively
    homogeneous windows. This may be done in one of
    two ways
  • Application developer can specify, a priori, a
    window size w (in sec. or min.), and assume that
    the waves properties within that window are
    obtained by averaging.
  • Use a homogeneity predicate as in the case of
    images, except that this homogeneity predicate
    applies to the one-dimensional case..

9
Windowing Using audio signal
  • The following figure shows a nonhomogeneous audio
    signal. After split into five windows, each
    window is homogeneous in the sense that it has a
    constant amplitude, wavelength, and wave velocity.

10
Indexing Using Feature Extraction
  • After segmentation, the audio signal may be
    viewed as a sequence of n windows, w1, , wn.
  • For each window, we extract some features
    associated with the audio signal.
  • If k features are extracted, then an audio signal
    may be considered to be a sequence of n points in
    a k-dimensional space.

11
Example Features
  • Intensity(I) the power of the signal generated
    by the wave (in Watts per square meters).
  • Where ? is the density of the material through
    which the sound is being propagated.
  • Loudness(L)
  • Where L0 denotes the loudness with the lowest
    frequency (about 15Hz) that a human ear can
    detect.

12
Content Index
  • In general, to index the content of an audio
    signal, we proceed with the following two step
  • Find a set w1, , wn of window segments.
  • For each window wi, store a vector consisting of
    K acoustical attributes.
  • An audio database may be viewed as a set of
    (K3)-tuples consisting of the audio source
    (audio file), the window (within that audio
    file), the duration of the window, and the K
    feature values associated with that window.
  • A k-d tree can be used to index audio data.

13
Content-based Retrieval for Music Databases
14
Introduction
  • The management of large collections of music data
    in a multimedia database has received much
    attention in the past few years.
  • For music content-based retrieval, we can extract
    the features, such as melodies, rhythms and
    chords, from the music data and develop indices
    that will help to retrieve the relevant music
    data quickly.

15
Music Feature string
Ex sol-do-re-mi-mi-mi-mi-re-mi-do-do Melody
feature stringeabccccbaa Rhythm
string1-1-1-2-2-1-1-1-1-2-2 Music feature
stinge1a1b1c2c2c1c1b1a2a2
A sample of You Are My Sunshine
16
Features of Music Data
  • Coding scheme a music object ? a sequence of
    music segments
  • music segment (segment type, segment duration,
    segment pitch)
  • four segment types (type A), (type B),
    (type C), and (type D)

17
Features of Music Data
  • For example,

the sequence of music segments (B,3,-3)
(A,1,1) (D,3,-3) (B,1,-2) (C,1,2) (C,1,2)
(C,1,1)
18
music segment (type, duration, pitch)
19
Music Data Retrieval System Architecture
20
Indexing
  • String Indexing for music data
  • Suffix tree
  • Numeric Indexing for music data
  • R-tree

21
Suffix tree
  • A suffix tree is an index structure that has been
    proposed to locate strings that are exactly
    matched to a target string.
  • No two edges out of a node can have edge-labels
    beginning with the same character.
  • For any leaf i, the concatenation of the
    edge-labels on the path from the root to leaf i
    exactly spells out the suffix of string that
    starts at position i.

22
Exababc ababc,babc,abc,bc,c
?
?
?
23
ExDo Re Do Re Mi ?ababc
24
Numeric Mapping
  • Numeric Mapping Function
  • v(m)the integer value of segment of m adjacent
    notes
  • m adjacent notes from melody feature string
  • P(xi)the integer value of each note
  • 1 ? i ? m

25
Numeric Mapping (Con.)
  • For example A music feature string denoted by
    bcdbc , n10, m4

26
Example
  • two tigers (S1 Do Re Mi Do Do Re Mi Do)

The integer value of music of two tigers.
27
Numeric Indexing Structure (R-Tree)
Non-leaf Node
Leaf Node
Link List
28
Pitch Change
  • abca?bcdb-1,1,-2
  • m adjacent notes from melody feature string
  • Adj the maximum value of distance of two
    pitches
  • D the total number of distances of pitches

29
Example abcaabcaSuppose m10, Adj9, D19
30
Numeric Index
31
Searching in Numeric Index
  • Exact Matching
  • For example Music query segment is ccdbb
  • ccdbb?ccdb
  • ?cdbb

1322
1132
32
Non-leaf Node
Leaf Node
Link List
  • s2,s3? s2,s3? s2,s3
  • position_s2 ? 2,3),position_s3 ? 1,4) ?s2.

33
Approximate Searching
We can examine the difference between the
transformed value of the query string and
existing data.
  • n the number of pitches
  • m adjacent notes from melody feature string
  • h the distance of two pitches

34
Example
Ex b b c d a b c d 1 1 2 3
0 1 2 3 3 2 1 1 3
2 1 0
Approximate matching conditions for m4, n10,h1
35
Multi-Feature indexing
  • Combine Suffix tree
  • Independent Suffix tree
  • Twin Suffix tree
  • Grid-Twin Suffix tree
  • Numeric Index
  • Hybrid Multi-feature Index

36
Combine Suffix Tree
  • The feature strings are directly used to
    construct the index in the index structure
    Combined Suffix Tree.

Exa1a2b1?12,7 121?12,7,1,6
37
Independent Suffix Tree
  • The Independent Suffix Trees separates the
    feature strings into a melody and a rhythm string
    and stores them in two independent suffix trees.

(Melodyababc)
(Rhythm12122)
constructed from a1b2a1b2c2
38
Twin Suffix Tree
  • Twin Suffix Tree is constructed by adding
    additional information to the Independent Tree.
  • This index structure consists of a melody and a
    rhythm suffix tree with links pointing.

39
Twin Suffix Tree
  • The Twin Suffix Tree constructed from
    a1b2a2b1a2b2c2

40
Grid-Twin Suffix Tree
  • Use a hash function to map each suffix of the
    feature string into a specific bucket of a 2D
    grid.
  • The hash function uses the first n symbols of the
    suffix to map it into a specific bucket.

41
Grid-Twin Suffix Tree
a1b2a2c1a3
42
Condensed Grid-Twin Suffix Tree
43
Condensed Grid-Twin Suffix Tree
  • abaca
  • caaca

44
Multi-Feature Numeric Indexing for Music Data
45
Multi-Feature Numeric Indexing for Music Data
46
Multi-Feature Numeric Indexing for Music Data
rhythm
chord
500
melody
500
47
Hybrid Multi-Feature Index
  • Using a multi-feature tree structure instead of
    grid structure in GTST.

48
Suffix Trees with Bit Arrays
  • Instead of the links between corresponding
    feature nodes in Twin Suffix Tree, the bit arrays
    are created to indicate the relationships between
    suffix trees.

49
Feature Extraction of Music Data
  • We can find some sequence of notes appeared more
    than one time in a music object, which are called
    the repeating patterns.
  • A lot of researches in musicology and music
    psychology consent that the repeating pattern is
    one of general features in music structure
    modeling.

50
Repeating Patterns of Music Data
  • Repeating patterns In string S, there is a
    sub-string appearing more than once and its
    length being equal to or greater than 2 .
  • Non-trivial repeating patterns The frequency of
    the repeating pattern X appearing in the string S
    is more than it is appearing in any other
    repeating patterns.
  • Fault tolerant non-trivial repeating patterns It
    allows the sequences with partial different notes
    being as in the same non-trivial repeating
    pattern.

51
Example
  • Consider the melody string C-D-E-F-C-D-E-C-D-E-F
    , this melody string has ten repeating patterns

non-trivial freq(C-D-E-F) freq(D-E-F)
freq(E-F) freq(F) 2 freq(C-D-E)
freq(C-D) freq(D-E) freq(C) freq(D)
freq(E) 3. gtonly C-D-E-F and C-D-E
are non-trivial.
52
Music Feature Extractions
  • Correlative Matrix
  • FastPET
  • RP-Tree
  • 2RC
  • Similar Non-trivial Repeating Pattern
  • Fault Tolerance Non-trivial Repeating Patterns

53
CORRELATIVE MATRIX
CScandidate set gt CS(pattern,rep_count,sub_coun
t)
There are four cases to set CS 1.Ti,j1 and
T(i1),(j1) 0 T1,4 1 and T2,50 ---gt
insert CS("C",1,0) 2.Ti,j1 and T(i1),(j1) ?
0 T1,5 1 and T2,6?0 ---gt modify to
CS("C",2,1) 3.Ti,jgt1 and T(i1),(j1) ? 0
T2,6 2 and T3,7?0 --gt insert CS("CA",1,1),("A",1
,1) 4.Ti,jgt1 and T(i1),(j1) 0 T4,8 4
and T5,90 ---gt insert CS("CAAC",1,0),("AAC"
,1,1),("AC",1,1) change ("C",6,1) into
("C",7,2)
The correlative matrix of the string
SCAACCAACDCBC"
54
CORRELATIVE MATRIX (cont.)
There are two more tasks we have to do 1.If a
repeating pattern is a substring of another
repeating pattern, and their repeating are the
same, it will be removed from the candidate set
CS. EX("CA",1,1),("CAA",1,1),("AA",1,1),("
AAC",1,1) and ("AC",1,1) are be moved
since they are all the substring of
the repeating pattern
("CAAC",1,0) 2.We should calculate the real
repeating frequency for every repeating pattern
found. EX "C"
f
55
RP-TREE
The RP-tree for the music feature string
SABCDEFGHABCDEFGHIJABC
56
RP-TREE (cont.)
57
FastPET Fast Pattern Extracting Technique
58
FastPET (cont.)
1 2 3 4 5 6
7 8 9 10 11 12
13 14
abc
P8 3,P11 3 PatternSet abc,3
59
FastPET (cont.)
1 2 3 4 5
6 7 8 9 10
11 12 13 14
P8 3,P11 3, 4 PatternSet
abc,3,abcd, 2
60
FastPET (cont.)
i
Non-trivial RP for abcdbcdabcabcd
P5 3, P8 3,P9 2, P11 3,
4,P12 3 PatternSet bc, 4,
abc,3, bcd, 3, abcd, 2
61
2RC (Two-Row Comparsion)
  • 2RC can provide memory saving, O(n).
  • Example Sabcdbcdabcabcd

i1
1 2 3 4 5 6
7 8 9 10 11 12
13 14
Row A
62
2RC (cont.)
1 2 3 4 5 6
7 8 9 10 11 12
13 14
i2
Row A
Row B
63
2RC (cont.)
i3
1 2 3 4 5 6
7 8 9 10 11 12
13 14
Row A
Row B
64
2RC (cont.)
i4
1 2 3 4 5 6
7 8 9 10 11 12
13 14
Row A
Row B
PatternSetabc,3
65
True suffix tree approach for non-trivial
repeating pattern discovering (TRP)
  • Step 1. constructing suffix tree by adding a stop
    symbol into the tail of string S.
  • Step 2. finding out repeating patterns.
  • Step 3. pattern sweeping.

66
Example 1 - Step 1 of TRP
  • True suffix tree of Sabcdbcdabcabcd.

3
67
Example 1 - Step 2 of TRP
  • All repeating patterns of music object S
    abcdbcdabcabcd.

68
Example 1 - Step 3 of TRP
  • Pattern sweeping for music object S
    abcdbcdabcabcd.

Non-trivial repeating patterns
69
Example 2 - TRP
  • Pattern sweeping for repeating patterns of S
    aaaaaaaaaa.

Non-trivial repeating pattern
70
Fault Tolerant Non-trivial Repeating Pattern
Discovering
  • Step 1. Constructing Suffix Tree
  • Step 2. Creating Repeating Pattern Table
  • Step 3. Greedy Concatenating Repeating Patterns
  • Step 4. Exacting Fault Tolerant Non-trivial
    Repeating Patterns

71
Step 2 of FTRP
  • Creating Repeating Pattern Table

72
Step 3 of FTRP
  • Greedy Concatenating Repeating Patterns

bc?dae
fault 1
fault 0
RP
RP
RP
RP
73
Step 4 of FTRP
  • Exacting Fault Tolerant Non-trivial Repeating
    Patterns

bc and dae are all in bc?dae
74
Performance Study
  • The Effect on Repeating Pattern Found

75
  • Hit Ratio Improvement
Write a Comment
User Comments (0)
About PowerShow.com