Databases for Microarrays - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Databases for Microarrays

Description:

... entities are invoved in a relationship, it is called as ternary relationship. ... it is usually broken in to one or more binary or ternary relationships. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 41
Provided by: isrecI
Category:

less

Transcript and Presenter's Notes

Title: Databases for Microarrays


1
Databases for Microarrays
  • Vidhya Jagannathan
  • SIB, Lausanne

2
Overview
  • Microarray data in a nutshell
  • Why databases?
  • What data to represent?
  • What is a database?
  • Different data models
  • E-R modelling
  • Microarray Databases
  • Standards being developed

3
Microarray Experiment
4
Microarray Data in a Nutshell
  • Lots of data to be managed before and after the
    experiment.
  • Data to be stored before the experiment .
  • Description of the array and the sample.
  • Direct access to all the cDNA and gene sequences,
    annotations, and physical DNA resources.
  • Data to be stored after the experiment
  • Raw Data - scanned images.
  • Gene Expression Matrix - Relative expression
    levels observed on various sites on the array.
  • Hence we can see that database software capable
    of dealing with larger volumes of numeric and
    image data is required.

5
Why Databases?
  • Tailored to datatype
  • Tailored to the Scientists
  • Intuitive ways to query the data
  • Diagrams, forms, point and click, text etc.
  • Support for efficient answering of queries.
  • Query optimisation, indexes, compact physical
    storage.

6
Data Representation
  • Goal Represent data in an intuitive and
    convenient manner
  • Without unnecessary replication of information
  • Making it easy to write queries to find required
    information
  • Supporting efficient retrieval of required
    information

7
What is a Database?
  • A database is an organised collection of pieces
    of structured electronic information.
  • Example 1 Libraires use a database system to
    keep track of library inventory and loans.
  • Example 2 All airlines use database system to
    manage their flights and reservations.
  • The collection of records kept for a common
    purpose such as these is known as a database.
  • The records of the database normally reside on a
    hard disk and the records are retrieved into
    computer memory only when they are accessed.
  • So the reasons are obvious why we need to discuss
    about a Microarray database.

8
Data Models
  • Describes a container for data and methods to
    store and retrieve data from that container.
  • Abstract math algorithms and concepts.
  • Cannot touch a data model.
  • Very useful

9
Types of Data Models
  • Ad-hoc file formats (not really data models!)
  • Relational data model
  • Object-relational data model
  • Object-oriented data model
  • XML (Extensible Markup Language)

10
Ad-hoc File Formats
  • The various 'ad-hoc' file formats in use for
    microarray data are
  • Flat file formats.
  • Spread sheet formats.
  • Not the least - Even MS-Word documents !!!
  • Very rudimentary method to store data .
  • Sometimes contains redundant information.
  • Extremely inefficient for retrieval of particular
    subsets of the results.

11
Relational Data Model
  • Most prevalent and used in many databases
    developed today.
  • The collection of related information is
    represented as a set of tables.
  • Data value is stored in the intersection of row
    and column
  • Column values are of the same kind. A Simple
    data validation.
  • Rows are unique. So no data redundancy and every
    row is meaningful and can be identified by the
    unique key.
  • Utilises Structured Query Language (SQL) for data
    storage, retrival and manipulation.

12
Terminology
  • Relation or Table
  • Attributes or Columns
  • Records or Rows

13
Example
Table
Row or Record
Field or Column
14
Advantages of Relational Model
  • Allows information to be broken up into logical
    units and stored in tables.
  • Allows combining data from different tables in
    different ways to derive useful information.
  • Great for queries involving information from
    multiple original sources.
  • Can easily gather related information.
  • e.g. information about a particular gene from
    multiple datasets/experiments

15
Example - ArrayExpress
16
Database Design Entity-Relationship Concept
Relationship
Entity A
Entity B
Examples
17
Entities
  • are real world objects
  • ex gene
  • contain attributes
  • ex gene_id, sequence
  • are drawn as rectangle boxes that holds the name
    of the entity and attribute in two different
    notations as there is no standard!

Gene
notation 2
notation 1
18
Relationship
  • Relationships provide connections between two or
    more entities
  • ex Which genes were used in which experiment
  • When two entities are involved in a relationship,
    it is known as binary relationship.
  • When three entities are invoved in a
    relationship, it is called as ternary
    relationship.
  • When more than three entities are involved in a
    relationship, it is usually broken in to one or
    more binary or ternary relationships.
  • are drawn as a line linking the involved entities
    as

used_in
Gene
Experiment
19
Connectivity and Cardinality
  • Connectivity - describes the mapping of
    associated entity instances in a relationship
  • one or many
  • Cardinality - actual number of related occurances
    for each of the two entities
  • one-to-one, one-to-many, many-to-many

20
Example of one to one relationship
21
Example of one to many relationship
22
Example of many to many relationship (unresolved)
23
Example of many to many relationship (linking
table)
24
Example E-R Diagram
Expt-Exptr
Expt-Sample
Multivalued attribute
Notation
Expt-Array
Many-to-one
25
Transforming E-R to Relational Database
  • Entities and Relationships are translated to
    relations or tables
  • Attributes of an entity are translated to columns
    are fields
  • The identifying attribute forms the primary key
    of a table
  • An instance of an entity is nothing but records
    or rows

26
E-R Diagram to Relational Schema
Multivalued attribute
27
Object Oriented Model
  • Object Oriented Model allows real world data to
    be represented as objects.
  • Objects encapsulate the data and provide methods
    to access or manipulate it.
  • Objects with specific structure and set of
    methods are said to belong to the object class.
  • Allows new classes to be created by extending
    the description of the parent class.
  • Child classes inherit the data and methods of the
    parent class.

28
Example
OODBMS
29
Object relational data model
  • Improved relational model by adding some features
    from object data models.
  • Information is represented as in relational
    models but column values not restricted to one
    mutliple values are allowed.
  • Example (sample table in previous slides)

30
Queries, queries, queries!!
  • Given a collection of microarray generated gene
    expression data, what kind of questions the users
    wish to pose.
  • Constructing an extensive list of possible
    interesting queries and data mining problems that
    has to be supported by the database will
    facilitate the design process.

31
Queries, queries, queries!!
  • Query to the data
  • Which genes are linked ?
  • Which genes are expressed similarly to my gene
    XYZ?
  • Which genes have a changed the expression in a
    second condition ?
  • Which genes are co-expressed in differing
    conditions ?
  • classification (of tumors, diseased tissues
    etc.) which patterns are characteristic for a
    certain class of samples, which genes are
    involved?

32
More Queries !!!
  • Queries that add a link in additional knowledge
  • functional classification of genes Are changes
    clustered in particular classes?
  • metabolic pathway information Is a certain
    pathway/route in a pathway affected?
  • disease information clinical follow up
    correlation to expression patterns.
  • phenotype information for mutants Are there
    correlations between particular phenotypes and
    expression patterns?

33
More Queries !!!
  • in what region is the interesting gene located in
    the genome?
  • is there synteny in this region with other
    species?
  • is there a known trait that maps to this region?

34
Query Language
  • Language in which user requests information from
    the database.
  • SQL
  • Data definition helps you implement your model
    and data manipulation helps you modify and
    retrive data
  • Advantages
  • Can specify query declaratively and let database
    system figure out best way of finding answers
  • Supports queries of medium complexity
  • Specialized languages
  • SQL language statements are not abstract but very
    close to spoken language.

35
Basic SQL Queries
  • Find the image for experiment number 1345
  • select image from experiment where
    experiment-id 1345
  • Find the experiment-id and image of all
    experiments involving e-coli
  • select experiment-id, image from experiment,
    sample where experiment.sample-id
    sample.sample-id and sample.organism e.coli
  • All combinations of rows from the relations in
    the from clause are considered, and those that
    satisfy the where conditions are output

36
(No Transcript)
37
Gene Expression Databases Require Integration
  • There are many different types of data presenting
    numerous relationships.
  • There are a number of Databases with lots of
    information.
  • Experiments need to be compared because the
    experiments are very difficult to perform and
    very expensive.
  • Solution Make all the databases talk the same
    language.
  • XML was the choice of data interchange format.

38
Why XML?
  • Why XML ? XML provides the method for defining
    the meaning or semantics of data.
  • Example A XML file of the earlier table we
    defined

ltgene_featuresgt ltgene_idgtGBVN32lt/gene_idgt
ltcontig_idgtNT_010651lt/contig_idgt
ltcontig_startgt2354807lt/contig_startgt
ltcontig_endgt2360778lt/contig_endgt
ltcontig_strandgtComplementlt/contig_strandgt lt/gene_f
eaturesgt
39
Mapping XML to Relational Database
  • The Data Structure in XML is defined in Document
    Type Descrciptor as follows
  • lt!ELEMENT gene_id (PCDATA)gt
  • lt!ELEMENT contig_id (PCDATA)gt
  • lt!ELEMENT contig_start (PCDATA)gt
  • lt!ELEMENT contig_end (PCDATA)gt
  • lt!ELEMENT contig_sequence (PCDATA)gt
  • This kind of DTD also helps us to have control
    over the vocabulary used.
  • SQL
  • create table gene (
  • gene_id varchar(5) primary key,
  • contig_id varchar(10) not null,
  • contig_start integer not null,
  • contig_end integer not null,
  • contig_sequence text not null)
  • So the DTD can be directly mapped into a
    relational database.

40
MAGE-ML As Data Interchage Format
Expression Data
Converter (program)
MAGE-ML
Databases
Write a Comment
User Comments (0)
About PowerShow.com