Title: CUIDADO UPF status report
1CUIDADO UPF status report
- M2 meeting
- March 2002
- Perfecto Herrera
- Universitat Pompeu Fabra
2Outline
- Music Description Schemes (Deliverable 2.2.1)
- Modules Developed (Deliverable 2.2.2)
- Melody description extraction module
- Rhythm description extraction module
- Interesting segments extraction module
- Timescaling module
- Other contributions
- Whats next?
3Music Description Schemes (Deliverable 2.2.1)
- Segment description schemes
- Melody description schemes
- Rhythm description schemes
- Instrument description schemes
- Some useful DSs currently in MPEG-7
4Music Description Schemes Goals
- Develop music description schemes that can be
used for the different CUIDADO prototypes and
partners - Keep MPEG-7 XML-schema compatibility
- Address different musical layers
- Address different abstraction levels (from lower
to higher levels)
5Segment description
- An AudioSegment is the basic unit for audio
description. - Segments can be decomposed into other segments.
- Descriptors and other audio DSs can be included
in segment descriptions
6Segment description
Some segment boundaries can be more clear than
others, therefore we need a low-level descriptor
for indicating their reliability
7Melody description
8Melody description pending issues
- Mid-level descriptors derived from pitch contour
scale, tonality, interval distributions,
tessiture... - Mid-level descriptors derived from score or grid
deviations - Higher-level subjective terms (energetic, solemn,
etc.)
Input from other partners is needed
9Rhythm description
10Rhythm description pending issues
- Tempo variation
- Meter extraction
- Deviations from structure
- Connections with instrument labelling
- Rhythm pattern taxonomies
- Higher-level subjective terms
Input from other partners is needed
11Instrument description
ClassificationScheme DS can be used for building
conceptual taxonomies (ex genre, subjective
ratings, etc...) SoundCategory DS can be used
specifically for instrument taxonomies
12Instrument description
Current SoundModel DS assumes a SpectrumBasis
decomposition other possible modeling
strategies should be included
13Auxiliary DSs (no new proposals here)
- Creation and Production DSs are the right
structures for describing the information about
the creation and production of the multimedia
content (examples who was the producer, where
was recorded, how good it is subjectively
rated.) - Media DSs are the structures for describing the
media-specific information of the multimedia data
(examples physical format, profiles of coding
parameters) - Usage DSs are intended for describing the
possible way of using or having used the content
(examples owners of the rights, usage
permissions where, when, how, by whom-, cost of
the creation, user settings...)
14Music Description Extractors and Transformers
(Deliverable 2.2.2)
- Interesting segments extractor
- Melody description extractor
- Rhythm description extractor
- Timescaling module
15Interesting segments extractor
- Implementation of Footes (2000) algorithm
- ( low self similarity means novelty)
- Self-similarity scale is user-controlled
- Interestingness is defined by choosing a set of
low-level descriptors - Best run in supervised mode (user gets graphs,
then select better parameter values) - Output segment boundary points novelty measure
- Functional dependency on Low-level descriptors
(D2.1.1.) Integration not yet solved - Enhanced functionalities when combined with some
post-processing (i.e. solo location)
16Interesting segments extractor
17Interesting segments extractor
- Note segmentation using energy, kurtosis, and F0
18Interesting segments extractor
- Section segmentation using energy, centroid and
kurtosis
19Melody description extractor
- Currently it is only intended for monophonic
phrases - Limitations
- Note segmentation needs improvements (better
transient detection)
20Melody description extractor
21Rhythm description extraction module
- Extracts timing and rhythmic data of drum loops
- Consists of sub-modules
- onset detector
- pulses (i.e. rhythmic levels) detector
- Tick
- Tempo
- Instrument labeler (yet to be implemented)
22Rhythm description extraction module
23Instrument description instrument labelling
- Towards automatic labelling of percussive slices
- First step automatic labelling of isolated
percussive sounds - Next proceed with mixtures of 2 and 3 sounds
24Rhythm description instrument labelling(submitte
d to ICMC2002)
Kick-Snare-OpenHH-ClosedHH-HiTom-MidTom-LoTom-Cras
h-Ride
Membranes-Plates
Kick-Snare-Hihat-Tom-Cymbals
SKEWNESS gt 4.619122 AND B40HZ70HZ gt 7.784892
AND MFCC3 lt 1.213368 Kick (105.0/0.0) KURTOSIS
gt 26.140138 AND TEMPORALCE lt 0.361035
AND ATTZCR gt 1.478743 Tom (103.0/0.0) B710KHZ
lt 0.948147 AND KURTOSIS lt 26.140138 AND ATTZCR
lt 22.661397 Snare (133.0/0.0) SPECCENTROID gt
11.491498 AND B1015KHZ gt 0.791702 HH
(100.0/2.0) SKEWNESS lt 4.485531
AND B160HZ190HZ lt 5.446338 AND MFCC3VAR gt
0.212043 AND MFCC4 gt -0.435871 Cymbal
(110.0/3.0)
- Database with gt 600 of rock drum sounds
- Initial set of 50 descriptors reduced to 20-30
after selection
25Timescaling module
- Time compression and expansion of audio content
- Very high quality (without timbre or pitch
alteration), transient and stereo image
preservation - Usable for content-based transformation (provided
content descriptions), not only as it is
26Timescaling module
- Some examples
- Vocal
- Orchestral
- Jazz
- Funk
27Other contributions F0 estimation
- Monophonic F0 detector based on Two-Way Mismatch
(Maher Beauchamp, 1993) - Integration into 2.1.1 modules is going to use
single frame estimations this may not be
optimal, as context (previous F0, next F0,
instrument, etc.) is not considered - Polyphonic F0 detector (Klapuri 2000) using
bandwise processing - Intended mainly for polyphonic-monotimbral
instruments, or small ensembles, not for dense
mixtures of sounds - Estimation of candidates is performed for each
analysis frame. Several candidates are obtained - The tracking of candidates is not still
implemented - Our main interest is in deriving a predominant F0
only
28Other contributions polyphonic F0 detection
29UPF CUIDADO team
- Xavier Amatriain
- Lars Fabig
- Emilia Gómez
- Günter Geiger
- Fabien Gouyon
- Gilles Peterschmitt
- Julien Ricard
- Perfecto Herrera
30Whats next? (beyond retrieval)
- Examples of achievable functionalities
- Music structure visualization
- Melody description visualization and manipulation
(content-based timescaling) - Rhythm loops processing
- Matching songs in playlists by tempo
31Whats next? Music Structure visualization
32Whats next? Melody description visualization
and manipulation (content-based timescaling)
33Whats next? Visualizing, navigating and editing
with rhythm marks
- Pulse marks visualization
- Pulse-based edition (duplicating parts, snap to
pulse mark) and navigation (skip to next tick)
34Whats next? Combining rhythm and instrument
descriptions
- Navigation by instrument occurrence (skip to next
snare) - Muting an instrument
- Processing (applying an effect to) an instrument
35Whats next? Combining rhythm and instrument
descriptions
- Building MIDI maps, re-constructing the loop,
and generating timbral variations
Original audio file
Sound Database
Tick extraction
Labeling
Instrument Category and/or Timbre
similarity Query
Reconstruction
MIDI map
MIDI file
new audio file
36Whats next? Combining rhythm descriptions and
timescaling
Transition from one pattern to another that do
not match in tempo
37Whats next? Matching songs by tempo for
playlist generation-