Title: Informedia and Health Care video archives
1 Research in Creating Video Archives and the
Potential for Health Care
Alex Hauptmann August 1, 2005
Carnegie Mellon University Pittsburgh, USA
2Outline
- Overview of the Informedia project
- Metadata exchange
- CareMedia video archives for medical
observations
3Informedia Project Mission
- Enable Search and Discovery in the Video Medium
- Automated information and metadata extraction
from video - Full-content search and retrieval of spoken
language and visual documents - Integration of speech, image and natural language
understanding for library creation and
exploration - Validation through user testbeds
4Application of Diverse, Imperfect Technologies
- Speech understanding for automatically derived
transcripts - Image understanding for video paragraphing
face, text, object and scene recognition - Natural language for segmentation, query
understanding and content summarization - Human computer interaction for video display,
navigation and reuse
5Video Search Demonstration
6(No Transcript)
7Informedia Metadata Extraction
Metadata Extractor
User Interface
Visualization Templates
(final representation)
Summarizer
Carnegie
8Informedia DVL Overview
Modularize metadata extraction process.
Specify metadata exchange
interface processing synchronization
XML/XSL data representation for user customized
interfaces
9Metadata Creation Paradigm
- Goal Provide a logical view of metadata creation
modules and their logical relationships
Metadata Creation
Video-based Analysis
Segment-based Analysis
- Face detection
- VOCR
- Title generation
- Topic Assignment
- Capitalization
- Phrase Extraction
- Geocoding
- (Still object detection)
- (Moving object detection)
- Scene break detection
- Black frame detection
- Speech recognition
- Signal-to-noise ratio
Segmentation Transcript Processing
10Informedia System Structure
11Text and Face Detection
12Camera and Motion Detection
Pan
Right object motion (not pan left)
13Video OCR
Final VOCR Results GERRY ADAMS SINN
FEIN PRESIDENT
14Annotations and Data Export
- Annotation fields contain metadata automatically
derived from the content (e.g. topics, chyron) - Annotations are included in the index (searchable
separately or combined with transcript) - Personal annotations are typed or spoken comments
that are established on a per user basis - bookmarking or commentary
- fully indexed and searchable with other data
- Shot bookmarking implemented and tested with both
novice and expert users - XML and segment metadata import/export capability
- Conversion to MPEG-7
15Informedia XML Presentation Architecture
16Efficient navigation is especially important
with video
- Multiple levels of abstraction and summarization
- Visual icons with relevance measure
- One-line headlines
- Static film strip views
- Active video skims
- Transcript following (even when errorful)
- Let the eyes do the searching
17Interfaces Let the eyes do the searching
18The Challenge of Extensibility (current work)
19Informedia Current Capabilities
- Information retrieval in both spoken language and
video/image domains - Fully automated transcriptions generated entirely
through speech recognition or with closed
captions - Information summaries at varying detail, both
visually and textually - Full content georeferencing of every event for
geographic display and query - Extraction and reuse of video documents for
Web-based access and presentation - All integrated into a user tested and validated
interface
20Informedia Focus
- Allow complete access to information within
multimedia sources - Generate metadata descriptions
- Segment audio and video into meaningful segments
- Provide abstractions for reviewing those segments
- Improve query and browsing interfaces to this
data - Iterate based on user studies
21Digital Human Memory
- Technology for creating a continuously recorded,
digital, high fidelity record of ones whole life
in video form - Personal, wearable units which record audio,
video, GPS and electronic communications
capturing all that is heard, seen experienced - Transforming this personal history into a
meaningful, accessible information resource with
auto-search and auto-summarization - Feasible 100MB/h or 1GB/day or .33 TB/year or
30 TB/lifetime
22Data Collection The Vest
23(No Transcript)
24LSCOM A Large Scale Concept Ontology for
Multimedia
- Collaborative activity of three critical
communities Users, Library Scientists and
Knowledge Experts, and Technical Researchers,
Algorithm, System and Solution Designers to
create a user-driven concept ontology for
analysis of video broadcast news
Lexicon and Ontology 1000 or more concepts
25Large Scale Concept Ontology for Multimedia
Understanding (LSCOM) Scope
Analyst User Interactions
Pre-Analyst Annotation
Analyst Tools
- Raw audio video (possibly plus some metadata)
- Extractable feature descriptors (eg., cut-rate,
motion)
Annotation Engine
Feature Extraction
Search Engines
Terms
LSCOM Workshop SCOPE
- Higher-level subjective interpretation
Inference Engines
Inference Engines
Features
Annotation metadata
Maximizes constraints such as computability,
utility, reusability, compatibility (e.g., Cyc,
OWL, etc.) Inference engines may be
rule-based, statistically-based, hard-wired, etc.
26The Power of an Ontology
- An explicit formal specification of how to
represent the objects, concepts and other
entities that are assumed to exist in some area
of interest and the relationships that hold among
them. - Descriptive power can be achieved if a small
number of primitives can be combined using a few
composition operators and a limited number of
relations to form multiple threads that generate
a large number of complex concepts - This compositional structure leads to a divide
and conquer strategy that makes it possible to
make progress on several fronts simultaneously - Different research groups can focus on different
concepts - Primitive concept recognition methods can be
shared reused - Composite concepts can be used as parts of other
concepts
27What an Ontology with Background Knowledge and
Inference can give us
(?x) (feelsEmotion x Happiness Positive)
- Caption A man watching his daughter take
her first step
(?x,y) (and (father x y) (gender x Female) (sees
x y) (walking
28Broadcast News Video Content Description Ontology
- Why the Focus on Broadcast News Domain?
- Critical mass of users, content providers,
applications - Good content availability (TRECVID, LDC, FBIS)
- Shares large set of core concepts with other
domains - Ontology Formalism
- Entity-Relationship (E-R) Graphs
- RDF, DAML / DAMLOIL, W3C OWL, CycL
- MPEG-7, MediaNet, VEML
- Seed Representations
- TRECVID-2003 News Lexicon (Annotation Forum)
- Library of Congress TGM
- CYC knowledge representation (ontology)
- CNN, BBC Classification Scheme, TVAnytime,
Comstock,
MPEG-7 Video Annotation Tool
29MPEG-7 for Metadata Exchange
Multimedia Content Description Interface
Standard
- Standardize a framework for describing
audio-visual content - Describe different aspects of multimedia
documents at different abstraction levels - Create descriptions to form the basis of
applications like search, filtering and browsing
multimedia content - Does NOT specify video compression or
transmission - MPEG-7 descriptions live in separate files from
the video - Extensible for new description schemas
30What is MPEG-7
Four Types of Normative Elements
- Descriptors (Ds)
- Primarily to describe low-level audio or visual
features - Description Schemes (DSs)
- Describe higher-level AV features such as
regions, segments, objects, events and other
immutable metadata related to creation and
production, usage, and so forth - Description Definition Language (DLL)
- Allow specifying new description schemes and
descriptors - Coding Schemes
- Specify how to code the needed descriptions to
satisfy the compression and the transmission
requirements
31MPEG-7 Application Chain
32Example
33Simple Example
- lt?xml version"1.0" encoding"iso-8859-1"?gt
- ltMpeg7 xmlns"urnmpegmpeg7schema2001"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-insta
nce" xmlnsmpeg7"urnmpegmpeg7schema2001"
xsischemaLocation"urnmpegmpeg7schema2001
.\Mpeg7-2001.xsd"gt - ltDescription xsitype"ContentEntityType"gt
- ltDescriptionMetadatagt
- ltLastUpdategt2001-04-06T0000000000lt/LastUpda
tegt - lt/DescriptionMetadatagt
- ltCreationInformation id"179650"gt
- ltCreationgt
- ltTitle xmllang"en"gtCNN World Today -
09-28-2000lt/Titlegt - ltAbstractgt
- ltFreeTextAnnotationgt
- Today, .......
- lt/FreeTextAnnotationgt
- lt/Abstractgt
- ltCreationCoordinatesgt
- ltCreationLocationgt
- ltName xmllang"es"gtCNNlt/Namegt
- ltCountrygtuslt/Countrygt
- ltAdministrativeUnitgtNew Yorklt/Administrative
Unitgt
34More Information about MPEG-7
- MPEG http//www.cselt.it/mpeg
- MPEG-7 Industry Forum http//www.mpeg7.org
- www
- Overview www.telecomitalialab.com/mpeg/standards/
mpeg-7/mpeg-7.htm - Scheme http//pmedia.i2.ibm.com8000/mpeg7/schema
/ - Mds http//www.mpeg7.ee.columbia.edu/
- DDL http//archive.dstc.edu.au/mpeg7-ddl/
- XM http//www.lis.ei.tum.de/research/bv/topics/mm
db/e_mpeg7.html
35CareMedia Behavior Observations in a Nursing
Home
36CareMedia Automated Behavior Analysis in the
Nursing Home
- Primary objective is improved quality of assisted
care - Example automating detection of behavioral
psychological symptoms of dementia (BPSD). - Apply more broadly to monitoring and maintaining
the quality of life - Ultimately, make automated, quantitative
measurements to - Explore relation of symptoms to environments in
which they occur - Evaluate symptoms longitudinally
- Determine the frequency of symptoms
- Develop patient profiles of responses to
treatment interventions - gtgtgtgt Enable earlier intervention to sustain
quality of life
37Applications in the Nursing Home
- Clinical/Research
- Tracking patient behavior and incidents in
long-term care facilities - e.g., disruptive vocalizations, falls
- recording patient mobility and activity levels
- Correlating with time of day, location and
environmental factors - Observing effects of drugs on individuals and
groups - Patient
- Cognitive assist - reminding, alerting and
summoning help - Staff training
- Analysis of video records of incidents used for
training - Management
- Monitoring and documenting compliance
38What is Presently Measured by Humans
- The Pittsburgh Agitation Scale
- Aberrant Vocalizations
(repetitious requests or
complaints, non-verbal vocalizations, i.e.
moaning) - Motor Agitation
(pacing, wandering,
rocking in chair) - Aggressiveness
(vocal threats,
threatening gestures) - Resisting Care
(pushing away to
avoid tasks)
39What are we trying to detect and measure?
- Person Tracking
- Person Identification
- Gross Motor Behavior
(broad area e.g.,
walking/gait, falling, wheel chair motion) - Small Motion Behavior
(task-specific
e.g., eating, washing hands, combing hair) - Fine Motion Behaviors
(close-in e.g.,
twitches, tremors, eye-blinking, frowns) - Social Interactions
- Gradual Trends over Time
- Rare Events
- Required in processing Obfuscation (for privacy
protection)
40CareMedia What are the observables?
- Who?
- Identify people across cameras, days.
- What are they doing?
- Wandering around
- Working on tasks
- Looking for things
- Eating, sleeping in public
- How well did they do it?
- Quantify normal performance
- Detect/report anomalies
41What could the system reporting look like?
42Coarse Motion Measurement
Informedia Digital Libraries
- Applying mean-shift analysis
target detection
red indicates target
43Fine Motion with Directions
Applying optical flow analysis
44Capturing Key Events
Unobserved Aggression
Observed Elopement
45Measure Normal Activity, Detect Whats Not
46Problem Privacy Protection in Public Places
- Block the persons that are reluctant to be
captured in the video - ¼ of nursing home residents deny disclosure of
their images - Real-time automatic people tracking framework
- Detect foreground information adapt for
real-time background - Multi-target, multi-assignment blob matching
- Apply mean shift algorithm to separate merged
persons
47Edge Motion Imaging
48Obscured Faces
49Problem Monitoring in Private Spaces
- Observe and monitor activity without storing
video - Maintain only feature vectors classify in
real-time - Record event type, time of day, duration
- Detect changes in daily pattern of activity
- Example Monitor bathroom/mirror activities
- What brushing teeth, combing hair, washing
hands, washing face - How small camera behind center of mirror, mono
microphone, embedded computing - Create summary
- how long, how often, chart by day
50(No Transcript)
51Conclusions (Designed for Controversy)
- The video data firehose has not yet started
- We need automatic metadata extraction to handle
the volume - Manual indexing/archiving does not scale
- Automatic metadata extraction will improve over
time - Provide for iterations of similar metadata with
different quality - Consider confidence for automatic metadata
- Data doesnt have to be perfect to be useful and
used
52Future Opportunities
- Instrument facilities with distributed sensors
for precision - Force sensors in chairs, beds, carpeting
- RFID in clothing, utensils
- Upgrade to hi-resolution cameras for fine motor
detection - Measure facial expressions, tremors
- Conduct large-scale testbeds for validation
- Comprehensive instrumentation in multiple homes
- Move through lesser levels of care to expand
market - From constrained skilled care environments to
less structured assisted and independent living - gtgtgtgt Enable earlier detection and intervention
- Delaying nursing home entry by 1 month saves
1.2B/year
53People Identification
Silhouette Extraction
Classification
Person 1
Person 2
Person 3
54Challenges of people identification
- Limited training data
- Imperfect feature representation (Color, Gait,
Face)