Challenges in Bioinformatics Part II - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Challenges in Bioinformatics Part II

Description:

Compared 72 dog breeds and 27 geographically-based groups of wolves as well as a ... conflicts with current dog breed classification. Other important ... – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 19

Provided by: VasileiosH9

Category:

more less

Transcript and Presenter's Notes

Title: Challenges in Bioinformatics Part II

1
Challenges in BioinformaticsPart II

Vasileios Hatzivassiloglou
University of Texas at Dallas

2
More observations on GloFish

GloFish license prohibits
intentional breeding
sale/barter/trade of offspring
Attitudes on recombinant DNA / genetic
engineering in Europe
Le Monde Diplomatique (January 2004)
Frankenfish and the future

3
Sequence analysis

Given a genome sequence, determine the gene
regions
Uses data from observed output (expression)
Models based on transducers and Markov Models

4
String Similarity

Two issues
Given two sequences, what is their similarity?
How are two similar sequences aligned?
Global alignment algorithms based on dynamic
programming
Local alignment FASTA, BLAST
Useful for homology search (finding related genes
/ proteins in different organisms)

5
Motif finding

Motifs are short sequences in DNA or proteins
that are over-repeated in a sample
Find motifs and understand their role
Typically with statistical word counting
methods
Alternatively, with a predictive model and the EM
(expectation-maximization) algorithm
Important for
recognizing promoter regions

6
Clustering

Genes that are co-expressed in a DNA microarray
experiment may be functionally related
Cluster genes on
sequence similarity, or
expression data
Issues include feature selection, outlier
detection, validation of clusters (mathematically
and biologically)

7
Classification

Detect particular types of proteins or genes
(e.g., find the gene responsible for a disease,
find highly active proteins)
HIV and leukemia as applications
Multiple models based on different feature spaces
Many algorithms including decision trees, kNN,
and support vector machines (SVM)

8
Protein folding

Proteins fold in a native state
3D properties important for drug interactions
(pharmacogenomics)
Variations from the native state can be
indicators of disease (Alzheimers, mad cow
disease)
Folding depends on chemical interactions and
thermodynamics
Direct simulation impractical (1 day for 1ns)
Treat as an optimization problem with approximate
solutions (minimize an energy function)

9
Protein Folding Prior Knowledge

Knowledge from other proteins can help
Find similar primary structure in other proteins
with known folding
Folding can be predicted (at least locally) and
experimentally verified

10
Phylogenetics

Recovery of the tree of life
Important for molecular biology because
we can predict missing sequences from known
sequences in related species
we can predict function from related
genes/proteins in another species

11
Phylogenetics

Goal Classify species based on data
Cladistics
Characters are the differentiating features which
have different states (e.g., flower color)
Construct matrix of features and automatically
locate best discriminating feature
Misleading evidence can exist (homeoplasies) due
to convergent evolution, e.g.,
wings in insects and birds

12
Phylogenetics Other approaches

Maximum likelihood models
Estimate probability of change for each
character, and arrange species in maximum
likelihood tree
Molecular systematics
Measure similarity between short sequences of DNA
(haplotypes) and use clustering to create the
tree
Can also use mRNA

13
Phylogeny of the dog (1997)

Haplotype based on a sequence of 261 bp
Compared 72 dog breeds and 27 geographically-based
groups of wolves as well as a control group of
other canids (coyotes, jackals)
27 haplotypes for dogs, 26 for wolves
Maximum differences 10 (dog), 12 (wolf), 12
(dogwolf)
Minimum difference 20 (dog vs. other canids)
Evidence shows
Dog evolved from wolf
Four classes of dogs, some more homogeneous
Evidence conflicts with current dog breed
classification

14
Other important computational issues

Data storage
Efficient database access and search
Text search and information retrieval
Lossless compression
Database interactions
Representation and visualization
Image processing
Robotics

15
Curation versus discovery

Much easier (and faster) to have an expert check
system results, than produce such results from
scratch
Automated discovery followed by curation
increases thoroughness
potentially removes bias (assuming system is not
biased)

16
Curation

Experimental results need to be verified by
experts
This is a large and time-consuming task
How can it be facilitated?
Interface issues, access to primary data, access
to literature
Concurrent verification
Can it be modeled and automated?
AI and statistical models

17
Knowledge modeling

Models of biological processes and the steps in
them (e.g., actions in regulatory networks)
Needed to support automated processing of
extracted data
Distinct from data extraction

18
Examples of modeled knowledge