Homework 1 and 2 review session - PowerPoint PPT Presentation

About This Presentation
Title:

Homework 1 and 2 review session

Description:

Title: PowerPoint Presentation Author: Kirill Last modified by: Kyrylo Bessonov Created Date: 8/16/2006 12:00:00 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 18
Provided by: Kiri70
Category:

less

Transcript and Presenter's Notes

Title: Homework 1 and 2 review session


1
Homework 1 and 2 review session
  • Presented by
  • Kirill Bessonov
  • November 2012

2
HW1 classical Q A (GenomeGraphs) (1)
  • First two questions were on Bioconductor
    libraries. There are BioC 608 packages
  • To get citations on particular library use
  • citation("library_name")
  • You were asked to get genomic data on specific
    gene
  • library(GenomeGraphs)
  • download the whole database of Ensemble IDs
  • ensembl_Human_Genes useMart("ensembl",dataset"h
    sapiens_gene_ensembl")
  • get info on gene form the database on the
    Ensemble ID
  • gene lt- makeGene(id "ENSG00000115145",
    type"ensembl_gene_id", biomart
    ensembl_Human_Genes )
  • get info on transcript
  • transcript lt- makeTranscript(id
    "ENSG00000115145", type"ensembl_gene_id",
    biomart ensembl_Human_Genes)
  • gdPlot ( list("gene"gene, "transcripts"transcrip
    t))
  • retrieve info from the database displaying first
    25 entries
  • getBM(c("ensembl_gene_id", "hgnc_symbol",
    "description"), filterc("with_exon_transcript",
    "with_protein_id", "with_transcript_variation"),va
    lueslist(TRUE, TRUE, TRUE), ensembl_Human_Genes
    )125,

3
HW1 classical Q A (GenomeGraphs) (2)
  • What is the gene name (i.e. hgnc_symbol) and
    function represented by the Ensembl ID -
    ENSG00000115145?
  • geneInfogetBM(c("ensembl_gene_id",
    "hgnc_symbol", "description"), filterc("with_exon
    _transcript", "with_protein_id",
    "with_transcript_variation"),valueslist(TRUE,
    TRUE, TRUE), ensembl_Human_Genes )
  • gt geneInfogeneInfoensembl_gene_id
    "ENSG00000115145",
  • ensembl_gene_id hgnc_symbol
    description
  • 4829 ENSG00000115145 STAM2 signal
    transducing adaptor molecule (SH3 domain and ITAM
    motif) 2
  • How many exons does the ensemble id
    ENSG00000115145 has? 51 exons
  • attr(gene, "ens")
  • ensembl_gene_id ensembl_transcript_id
    ensembl_exon_id exon_chrom_start exon_chrom_end
    rank strand biotype
  • 1 ENSG00000115145 ENST00000263904
    ENSE00001351655 153032117 153032506
    1 -1 protein_coding
  • ENSG00000115145 ENST00000263904
    ENSE00002888710 153006659 153006743
    2 -1 protein_coding
  • 48 ENSG00000115145 ENST00000494589
    ENSE00002785037 153004538 153004636
    3 -1 protein_coding
  • 49 ENSG00000115145 ENST00000494589
    ENSE00002808134 153003676 153003822
    4 -1 protein_coding
  • 50 ENSG00000115145 ENST00000494589
    ENSE00002929781 153001402 153001471
    5 -1 protein_coding
  • 51 ENSG00000115145 ENST00000494589
    ENSE00001828491 153000503 153000527
    6 -1 protein_coding

4
HW1 classical Q A (GenomeGraphs) (3)
  • Execute the following command. How many
    chromosomes do you see?
  • 25 chromosomes. 22 autosomal pairs, 1 sex pair
    and one mitochondrial chromosome
  • Why the number of chromosomes in this Ensembl
    dataset is greater than 23 chromosome pairs? What
    does MT, X and Y refer to?
  • Because of the MT chromosome, since X and Y can
    be grouped to a single pair
  • gt getBM("chromosome_name","","",
    ensembl_Human_Genes)c(122,433435),1
  • 1 "1" "10" "11" "12" "13" "14" "15" "16" "17"
    "18" "19" "2" "20" "21" "22" "3" "4" "5" "6"
    "7" "8" "9" "MT" "X" "Y"

5
HW2 Pairwise alignments (classical QA)
6
HW2 Pairwise alignments (classical QA) Q1
  • Please align globally using NeedlemanWunsch
    algorithm the following DNA sequences. Use
  • The following scoring rules a) gap -5 b) match
    between two bases 5 c) mismatch between two
    bases 3

7
HW2 Pairwise alignments (classical QA) Q3
  • Do local protein alignment using BLOSUM 62 matrix
    on the HEAGAWGHEE and PAWHAE sequence. The
    scoring rules are a) gap -8 matches and
    mismatches are given in BLOSUM 62 matrix.

8
HW2 Pairwise alignments (classical QA) Q5
  • Produce a dot plot of Human and Mouse p53
    proteins from previous question and paste the
    plot below.
  • Complete the lines of R code to get the dot
    plot.  
  • Are both proteins similar?
  • Yes, very similar since we see clear diagonal
    corresponding to gt90 of sequences length
  • Where is/are the region(s) of greatest variation
    occur?
  • Between 50-100

9
HW2 Pairwise alignments (classical QA) Q7
  • What global alignment score do you get for the
    two p53 proteins, when you use the BLOSUM62 alignm
    ent matrix, a gap opening penalty of -10 and a
    gap extension penalty of -0.5? Answer score of
    1556
  • query("p53_HUMAN", "ACP04637")
  • p53_HUMAN_seq getSequence(p53_HUMAN)
  •  
  • query("p53_MOUSE", "ACP02340")
  • p53_MOUSE_seq getSequence(p53_MOUSE)
  • globalAlign lt- pairwiseAlignment(p53_HUMAN_seq,
    p53_MOUSE_seq, substitutionMatrix "BLOSUM62",
    gapOpening -10, gapExtension -0.5)
  • Errors the R-code was not stated and the ID of
    proteins were not given such as Uniprot ID P04637

10
HW2 Computer Style Implementation of NW
algorithm in R
11
HW2 Computer style (NW algorithm) 1
  • Given the pseudo-code implement NW algorithm in R
  • Algorithm has two parts
  • Calculation of the alignment F-matrix
  • Finding the optimal path(s) through the matrix

for to length(A) F(i,0) ? di for j0 to
length(B) F(0,j) ? dj for i1 to length(A)
for j1 to length(B) Match ?
F(i-1,j-1) S(Ai, Bj) Delete ? F(i-1, j)
d Insert ? F(i, j-1) d F(i,j) ?
max(Match, Insert, Delete)
d gap penalty score i and j positions in A
B sequences
12
HW2 Computer style (NW algorithm) 2
  • Fmatrix function(A,B)
  • fmatrix matrix(0, nrow (nchar(A)1) , ncol
    nchar(B)1)
  • d -8 this is gap penalty
  • for(i in 0 nchar(A))
  • fmatrixi1,1 d i populates initial
    row with gap penalty
  • for(j in 0 nchar(B))
  • fmatrix1,j1 d i
  • for(i in 1 nchar(A))
  • for(j in 1 nchar(B))
  • score rules(A,B) get me sccore for the
    pair of aa or nt
  • match fmatrixi,j score
  • delete fmatrixi,j1 d
  • insert fmatrixi1,j d
  • fmatrixi1,j1 max(match,delete,insert
    )
  • colnames(fmatrix) strsplit( paste(" " , B,
    sep""), "")1

13
HW2 Computer style (NW algorithm) 3
  • rules function(A,B)
  • s.matrix lt- matrix(rep(0,16), nrow 4, ncol4,
    byrowTRUE, dimnames list(c("A","C","G","T"),c
    ("A","C","T","G")))
  • s.matrix"A", c(2,-1,-1,-1)
  • s.matrix"C", c(-1,2,-1,-1)
  • s.matrix"T", c(-1,-1,2,-1)
  • s.matrix"G", c(-1,-1,-1,2)

gt s.matrix A C T G A 2 -1 -1 -1 C -1 2 -1
-1 G -1 -1 2 -1 T -1 -1 -1 2
14
HW2 Computer style (NW algorithm) 4
  • Check the F-matrix
  • fmatrixFmatrix("ATCG", "TG")
  • T G
  • -32 -32 -32
  • A -8 -16 -24
  • T -16 -6 -14
  • C -24 -14 -4
  • G -32 -22 -12
  • Start finding the optimal path(s) through the
    matrix
  • AlignmentA ""
  • AlignmentB ""
  • i nchar(A) 1
  • j nchar(B) 1
  • while(i gt 1 j gt 1)
  • CurrentScore fmatrixi,j get score
    at current position of F-matrix

15
HW1 Computer style (NW algorithm) 5
  • Selecting the bottom right cell and starting to
    trace-back the path of optimal alignment
  • AlignmentA ""
  • AlignmentB ""
  • while(i gt 1 j gt 1)
  • CurrentScore fmatrixi,j
  • ScoreDiag fmatrixi - 1, j - 1
  • ScoreUp fmatrixi, j - 1
  • ScoreLeft fmatrixi - 1, j
  • considering the score came from diagonal
  • if (CurrentScore ScoreDiag
    s.matrixsubstr(A,i,i), substr(B,j,j)) )
  • AlignmentA paste(substr(A,i-1,i-1),Alig
    nmentA, sep "")
  • AlignmentB paste(substr(B,j-1,j-1),Alig
    nmentB, sep "")
  • i i - 1
  • j j - 1

Which cell of the F-matrix I am now?
On diagonal path previous next cell
16
HW2 Computer style (NW algorithm) 6
  • considering if the score comes from left
    (introducing a gap)
  • else if(CurrentScore ScoreLeft d)
  • AlignmentA paste(substr(A,i-1,i-1),AlignmentA
    , sep "")
  • AlignmentB paste( "-", AlignmentB, sep
    "")
  • i i - 1
  • considering if the score comes from upper cell
    (introducing a gap)
  • else if(CurrentScore ScoreUp d)
  • AlignmentA paste( "-", AlignmentA, sep "")
  • AlignmentB paste(substr(B,j-1,j-1),
    AlignmentB, sep "")
  • j j 1
  • print(AlignmentA)
  • print(AlignmentB)
  • finalScore cat("Final score ",fmatrix(nchar(A)
    1),(nchar(B)1))

17
HW2 Computer style (NW algorithm) 7
  • The scoring matrices could have been accessed
    though character indices not requiring conversion
    and making code faster
  • How one would output more than one BEST possible
    alignments?
  • Please use more comments in your R-code
  • Would be nice to see trace-backs visually
  • Also the scoring rules were not stated clearly
Write a Comment
User Comments (0)
About PowerShow.com