Challenges in Bioinformatics Part II - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Challenges in Bioinformatics Part II

Description:

Compared 72 dog breeds and 27 geographically-based groups of wolves as well as a ... conflicts with current dog breed classification. Other important ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 19
Provided by: VasileiosH9
Category:

less

Transcript and Presenter's Notes

Title: Challenges in Bioinformatics Part II


1
Challenges in BioinformaticsPart II
  • Vasileios Hatzivassiloglou
  • University of Texas at Dallas

2
More observations on GloFish
  • GloFish license prohibits
  • intentional breeding
  • sale/barter/trade of offspring
  • Attitudes on recombinant DNA / genetic
    engineering in Europe
  • Le Monde Diplomatique (January 2004)
  • Frankenfish and the future

3
Sequence analysis
  • Given a genome sequence, determine the gene
    regions
  • Uses data from observed output (expression)
  • Models based on transducers and Markov Models

4
String Similarity
  • Two issues
  • Given two sequences, what is their similarity?
  • How are two similar sequences aligned?
  • Global alignment algorithms based on dynamic
    programming
  • Local alignment FASTA, BLAST
  • Useful for homology search (finding related genes
    / proteins in different organisms)

5
Motif finding
  • Motifs are short sequences in DNA or proteins
    that are over-repeated in a sample
  • Find motifs and understand their role
  • Typically with statistical word counting
    methods
  • Alternatively, with a predictive model and the EM
    (expectation-maximization) algorithm
  • Important for
  • recognizing promoter regions

6
Clustering
  • Genes that are co-expressed in a DNA microarray
    experiment may be functionally related
  • Cluster genes on
  • sequence similarity, or
  • expression data
  • Issues include feature selection, outlier
    detection, validation of clusters (mathematically
    and biologically)

7
Classification
  • Detect particular types of proteins or genes
    (e.g., find the gene responsible for a disease,
    find highly active proteins)
  • HIV and leukemia as applications
  • Multiple models based on different feature spaces
  • Many algorithms including decision trees, kNN,
    and support vector machines (SVM)

8
Protein folding
  • Proteins fold in a native state
  • 3D properties important for drug interactions
    (pharmacogenomics)
  • Variations from the native state can be
    indicators of disease (Alzheimers, mad cow
    disease)
  • Folding depends on chemical interactions and
    thermodynamics
  • Direct simulation impractical (1 day for 1ns)
  • Treat as an optimization problem with approximate
    solutions (minimize an energy function)

9
Protein Folding Prior Knowledge
  • Knowledge from other proteins can help
  • Find similar primary structure in other proteins
    with known folding
  • Folding can be predicted (at least locally) and
    experimentally verified

10
Phylogenetics
  • Recovery of the tree of life
  • Important for molecular biology because
  • we can predict missing sequences from known
    sequences in related species
  • we can predict function from related
    genes/proteins in another species

11
Phylogenetics
  • Goal Classify species based on data
  • Cladistics
  • Characters are the differentiating features which
    have different states (e.g., flower color)
  • Construct matrix of features and automatically
    locate best discriminating feature
  • Misleading evidence can exist (homeoplasies) due
    to convergent evolution, e.g.,
  • wings in insects and birds

12
Phylogenetics Other approaches
  • Maximum likelihood models
  • Estimate probability of change for each
    character, and arrange species in maximum
    likelihood tree
  • Molecular systematics
  • Measure similarity between short sequences of DNA
    (haplotypes) and use clustering to create the
    tree
  • Can also use mRNA

13
Phylogeny of the dog (1997)
  • Haplotype based on a sequence of 261 bp
  • Compared 72 dog breeds and 27 geographically-based
    groups of wolves as well as a control group of
    other canids (coyotes, jackals)
  • 27 haplotypes for dogs, 26 for wolves
  • Maximum differences 10 (dog), 12 (wolf), 12
    (dogwolf)
  • Minimum difference 20 (dog vs. other canids)
  • Evidence shows
  • Dog evolved from wolf
  • Four classes of dogs, some more homogeneous
  • Evidence conflicts with current dog breed
    classification

14
Other important computational issues
  • Data storage
  • Efficient database access and search
  • Text search and information retrieval
  • Lossless compression
  • Database interactions
  • Representation and visualization
  • Image processing
  • Robotics

15
Curation versus discovery
  • Much easier (and faster) to have an expert check
    system results, than produce such results from
    scratch
  • Automated discovery followed by curation
  • increases thoroughness
  • potentially removes bias (assuming system is not
    biased)

16
Curation
  • Experimental results need to be verified by
    experts
  • This is a large and time-consuming task
  • How can it be facilitated?
  • Interface issues, access to primary data, access
    to literature
  • Concurrent verification
  • Can it be modeled and automated?
  • AI and statistical models

17
Knowledge modeling
  • Models of biological processes and the steps in
    them (e.g., actions in regulatory networks)
  • Needed to support automated processing of
    extracted data
  • Distinct from data extraction

18
Examples of modeled knowledge
  • Types of protein actions (bind, activate,
    phosporylate, ...)
  • Constraints on actions
  • (Functional) classes of proteins
  • Ontologies of concepts in the biological domain
  • Much of this is derived via text mining
Write a Comment
User Comments (0)
About PowerShow.com