Mark Gerstein, Yale University - PowerPoint PPT Presentation

About This Presentation
Title:

Mark Gerstein, Yale University

Description:

1 (c) M Gerstein, 2006, Yale, gersteinlab.org. Mark Gerstein, Yale University ... Vector configuration boiled down a scalar E through potential ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 15
Provided by: off661
Category:

less

Transcript and Presenter's Notes

Title: Mark Gerstein, Yale University


1
BIOINFORMATICSSummary
  • Mark Gerstein, Yale University
  • gersteinlab.org/courses/452
  • (last edit in fall '06, handout version,
    including in-class changes)

2
Used in class M11 2006,12.06
3
You'll Forget
From S Harris's Science Cartoons,
http//www.sciencecartoonsplus.com
4
So I'll distill
5
What is Bioinformatics?
  • (Molecular) Bio - informatics
  • One idea for a definition?Bioinformatics is
    conceptualizing biology in terms of molecules (in
    the sense of physical-chemistry) and then
    applying informatics techniques (derived from
    disciplines such as applied math, CS, and
    statistics) to understand and organize the
    information associated with these molecules, on
    a large-scale.
  • Bioinformatics is MIS for Molecular Biology
    Information. It is a practical discipline with
    many applications.

6
Data Types
7
"Core" Bioinformatics
  • Core Stuff
  • Computing with sequences and structures
  • protein structure prediction
  • biological databases and mining them
  • New Stuff Networks and Expression Analysis
  • Will teach these in CS 545 (Data Mining) next
    semester
  • Fairly Speculative simulating cells

8
Hierarchical Structure of Course Information
  • Memorize the previous summary
  • Good familarity with main points in lectures
    (quizzes)
  • Rest of overheads and readings for reference on
    projects and

9
Cross-cutting Themes
  • Algorithms for Comparison
  • Dynamic programming
  • Different measures of similarity(RMS vs.
    Structural similarity PAM Blossum vs ID)
  • Generalized similarity matrix in threading
  • Statistical scoring schemes (with P-values)
  • For sequences, structures, sequence to structure,
    and even expression data
  • Time complexity of the comparisons
  • Predictions
  • LOD scores ( with features / expectation )
  • Progressive more complex features
  • Amount of features information IN vs. prediction
    OUT
  • Testing against benchmarks with
    cross-validation(sec. struc. prediction, seq.
    comparison scoring, datamining)
  • Other methods, need for heuristics

10
Cross-cutting Themes
  • Increasing the chemically reality and complexity
    of genes
  • Character strings, fold (just CAs), volumes and
    surfaces from all atom representation, energy and
    minimization, dynamics (time and velocity)
  • Simulation
  • Vector configuration boiled down a scalar E
    through potential
  • Compute intensive exploration of configurations
    (MC, MD)
  • Averages over correctly weighted configurations
  • Importance of simplification
  • The Survey Mode
  • Collecting information in DB tables
  • Importance of integration and interoperation
  • Organizing it around "part" classifications
  • Surveying it for useful statistics (taking into
    account biases)
  • Doing datamining to find more tenuous
    relationships

11
Anti-Themes
12
Depth v Breadth
13
Historical Perspective
  • Single Structures
  • Modeling Geometry
  • Forces Simulation
  • Docking
  • Sequences, Sequence-Structure Relationships
  • Alignment
  • Structure Prediction
  • Fold recognition
  • Genomics
  • Dealing with many sequences
  • Gene finding Genome Annotation
  • Databases
  • Integrative Analysis
  • Expression Proteomics Data
  • Datamining
  • Simulation again.

14
(from CooperToons, http//members.aol.com/ChipCoop
er/cartoon26.html)
Write a Comment
User Comments (0)
About PowerShow.com