France Telecom's expectations and research in Object Recognition - PowerPoint PPT Presentation

About This Presentation
Title:

France Telecom's expectations and research in Object Recognition

Description:

L'acceptation de ce document par son destinataire implique, de la part de ce ... aucune divulgation et aucune utilisation commerciale sans l'accord pr alable ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 13
Provided by: Bodo3
Category:

less

Transcript and Presenter's Notes

Title: France Telecom's expectations and research in Object Recognition


1
France Telecom's expectations and research in
Object Recognition
  • Henri Sanson, Christophe Laurent, Olivier Bernier

2
Outline
  • France Telecom Markets and context evolution
  • Visual content indexing from low-level to
    semantic description
  • Overview of current research in Image retrieval
    and video annotation
  • Object recognition for Human Computer Interface

3
France Telecom Markets and Applications
  • France Telecom is a global Telecommunication
    operator
  • from telephony to multimedia and audiovisual
    services
  • Fixed networks/services
  • Mobiles networks/services
  • Internet access and services
  • IP Data and communication services for
    corporations
  • 2 structuring trends
  • An increasing importance and presence of visual
    contents in the services
  • leveraging the higher bandwidth
  • high value added services (people pay for content
    and the means to reach it easily)
  • A need to compensate an increasing technological
    and functional complexity by providing natural
    Human Machine/Service Interfaces
  • Vocal Interfaces
  • Visual interfaces

4
Visual content indexing from low-level to
semantic description (1)
  • Context
  • more visual content, huge data volumes, temporal
    constraint for videos
  • Need for efficient indexing methods enabling fast
    and relevant access
  • Applications
  • Media asset management in addition to
    traditional audiovisual companies (TV,
    production), more and more enterprises own image
    or video assets and face management issues.
  • Relevance has prime importance
  • But cost effectiveness is becoming more and more
    accute
  • Web search and filtering engines
  • Huge volume, very variable robust automatic
    indexing is the only solution
  • Although surrounding text may be used, the visual
    content itself is the only reliable source to use
  • Video surveillance
  • Specific environment and content type
  • Automatic processing

5
Visual content indexing from low-level to
semantic description (2)
  • Traditionnally 2 radically opposite approaches
  • Accurate but manual Semantic Annotation
    Ontologies
  • Time consuming
  • The indexing choices limit possible queries
  • Low-level based feature descriptions aka
    "Color, Texture and Shape" MPEG-7 Visual
    Framework
  • Automatic processing but very limited in practice
    (save for some classification purposes) relies
    on query-by-example, little usable
  • Emerging trend convergence of both previous
    paradigms
  • Automated knowledge-based semantic indexing using
    visual recognition
  • Many advantages
  • Semantic and automatic
  • No linguistic fence
  • Indexing complementation is always possible
  • But still difficult !

6
Visual content indexing from low-level to
semantic description (3)
  • Constraints of the application impacting the
    recognition
  • High variability of shooting conditions
  • same objects appear very differently color,
    pose, scale, shadows,
  • High variability in the content type (indoor,
    outdoor, News, movies, sports)
  • Potentially huge number of objects or object
    categories to recognize concurrently
  • Video Real time working targeted, even much
    faster for still image
  • Recognition approach as flexible as possible is
    expected for generic objects
  • Qualification of the methods must in fine be done
    by real experimentation "on the ground" by true
    end users, and is measured by their satisfaction
    rate.

7
Current work (1)
  • Research
  • Color space invariance w.r.t shooting conditions
  • Salient feature-based image retrieval and object
    description
  • Face detection and recognition in generic images
  • Development Video indexing platform
  • Video shot change detection, specific image
    labelling (news speaker, weather report/
    commercal gingle), face detection, text detection
  • Audio/speech speech/music/other segmentation,
    keywords recognition, free vocabulary phonetic
    search

8
Saliency-based Color Image Indexing
  • Image signature is extracted from a limited
    number of perceptually important pixels called
    salient points
  • Salient points are computed by combining a
    discrete wavelet transform with a Zerotree
    representation of wavelet coefficients
  • Salient points are located on most sharp
    boundaries
  • The image signature is composed of a color
    correlogram computed in the neighborhood of each
    salient point
  • This signature can be completed with a texture
    signature computed around the salient points
  • An invariant color space (c1c2c3) is used to be
    robust to imaging conditions

9
Salient Points Extraction
10
Experimental Results
  • Database containing 2000 TV images
  • Extraction of 18 difficult requests
  • Computation of ranking metric
  • Comparaison with the MPEG-7 SCD (Scalable Color
    Descriptor)

11
Object recognition for Human Computer Interface
(1)
  • Context
  • Services functionalities are everyday more and
    more sophisticated
  • End users are expecting simpler user interfaces
  • Visual interactions appear to be a good
    complement to more usual vocal interfaces
  • Universality much less constrained by
    linguistical variability
  • Web cams are widespread
  • Maybe less sensitive to environmental noise/
    capturing conditions
  • Permits fast interaction an image worths 1000
    words

12
Visual recognition for Human-Computer Interfaces
  • Face detection and tracking
  • Neural Network based face detection for still
    images.
  • Extension to real time face detection in video
    streams.
  • Real time face tracking for HCI using
    statistical models (EM, particle filtering).
  • Gesture recognition
  • Static hand posture recognition based on neural
    networks.
  • Dynamic gesture recognition (HMMS, IOHMMs and
    GIOHMMs).
  • Body tracking in 2D.
  • Body tracking in 3D using disparity cameras
    (Triclops).
Write a Comment
User Comments (0)
About PowerShow.com