Title: Using the NIF for Level 3 Data Mining
1Using the NIF for Level 3 Data Mining Case
Study - SumsDB Database
David C. Van Essen
Washington University in St. Louis
Presentation to NIH Leadership 5 December, 2007
Supported by the Human Brain Project (NIMH, NSF,
NCI, NLM, NASA)
2Outline
- Overview of SumsDB database (brain-mapping data)
- Stereotaxic coordinates - target for level 3
searches - Demonsrate NIF-based searches
- Possible next steps
SumsDB
WebCaret
NIF
3SumsDB (Surface Management System Database)
http//sumsdb.wustl.edu/sums/
Supported by the Human Brain Project (NIMH, NSF,
NCI, NLM, NASA)
4SumsDB Attributes
- Diversity!
- Many species (human, macaque, rodents, apes)
- Surfaces (cerebral, cerebellar cortex) volumes
- Atlases individuals
- Different ages, disease conditions
- Diverse data overlays (gt 50 data types)
- Functional MRI activations
- Stereotaxic coordinates (foci)
- Surface morphometry
- Connectivity
- Published, unpublished datasets
- Data access online visualization (WebCaret)
download - Search options coordinates regional labels
functional attributes
5SumsDB
- Current status
- 42 Gb archived data
- 12 Gb public data
- 70,000 files (archives)
- Since 2005
- 14,000 archive downloads for offline analyses
- 10,000 WebCaret launches
- 30 active user accounts (data upload)
6Level 1 access Finding and characterizing SumsDB
7SumsDB - Level 3 Data Mining
- What data types to start with?
- Stereotaxic (3D) foci
8Side-by-side comparisons are not easy!
9Meta-analysis of 32 published mental rotation
studies (Zacks, 2007)
- New insights regarding functional specialization
- Aided by surface-based visualization on PALS
atlas - Analysis entailed a laborious literature survey
- Data are now incorporated in SumsDB, WebCaret
10Stereotaxic (3D) foci as targets for Level 3 Data
Mining
- Current status of SumsDB
- 13,700 published stereotaxic coordinates
- gt500 studies
- Rich but concise dataset, excellent for data
mining - 10 of published data
13,700 foci displayed in WebCaret
11Selected results (mental rotation) in WebCaret
Extensive metadata associated with each focus
Meta-analysis by Jeff Zacks (2007)
12Improve searchability by adding Foci Attributes
Select multiple partitioning schemes (in Caret)
13Viewing Foci Attributes
Cortical area assignments (multiple partitioning
schemes) Allows search by area (as well as
spatial coordinates)
14Level 3 Registration to NIF Mediator
- Requirements
- Create and register export schema
- (subset of primary schema)
- Map relevant terminology
- Start with Brodmann areas (more to come)
- Create SumsDB user account (nifsums on
BIRNportal) - (read-only access to portion of SumsDB)
- Done by
- Local DB admin (Ping Gu)
- NIF developers (Vadim Astakhov, Amarnath Gupta,
Bill Bug et al.)
15- Results - Flexible NIF-mediated searches!
- Search by cortical areas
- Search by identified sulci
- Search by functional terms
- Search across multiple databases (federation)
16Search NIF for area 22
Too many options!
17Search NIF for area 22
18Area 22 search results - direct from NIF to
SumsDB
19408 results Brodmann.22, select for WebCaret
visualization
20Area 22 foci viewed on PALS human cortical atlas
21Select particular studies of interest, view study
metadata
22Search SumsDB
Can search from within SumsDB, but less flexible
(doesnt capitalize on NIF ontology)
23Search NIF for dopamine, view SumsDB
1 search result from searching title more if
keywords abstracts are searched
24Search NIF for dopamine, view CCDB
25Search NIF for dopamine, view NeuroMorpho
26Results! Search NIF for cerebellum, view SumsDB
27Search NIF for cerebellum, view CCDB
28Search NIF for cerebellum, view Senselab
29Possible Next Steps
- Expand vocabulary (multiple partitioning schemes
sulci gyri) - Support spatial queries (x,y,z coordinates,
range) - Display search results on atlas volumes as well
as surfaces - Incorporate other data types (fMRI maps)
- Extend to macaque, mouse, rat datasets
- Establish level 3 links with additional
databases, e.g. - Brainmap (stereotaxic foci P. Fox)
- BrainInfo (macaque D. Bowden)
- Allen Brain Atlas (mouse gene expression)
- fMRIDC (human M. Gazzaniga)
- Brainmaps.org (histology E.G. Jones)
- CoCoMac (connectivity R. Kotter)
30Example 1. fMRI data in different databases
(29 studies)
(32 studies)
- Suppose we could ask NIF
- What brain regions are activated during mental
rotation? (query multiple databases) - What functional characteristics are associated
with structural abnormalities identified
in Williams Syndrome? (schizophrenia,
Tourettes, ADHD, etc.) - What abnormalities (fMRI activations or
structural) have been reported in Alzheimers?
31Example 2. Gene expression, cortical areas in
mouse
Gene expression (Arc) Allen Brain Atlas
Paxinos Franklin areas SumsDB/WebCaret
Allow conjoint searches - by area (SumsDB), high
expression (ABA)
- What genes are expressed at a high level in
mouse - primary auditory cortex?
- What is the dopamine receptor distribution in
the mouse - homologue of human dorso-lateral prefrontal
cortex?
32Summary
- NIF Level 3 registration works!
- Allows diverse queries across databases
(federation) - Linking with NIF can improve database design,
content - Currently still in pilot mode
- Many options for enhancement (NIF v1.0 and
beyond)