Homework - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Homework

Description:

Homework #1. Problem 1 Find CpG island in human chromosome 21 ... Also, the CpG island should be maximal, in the sense that if we extend one ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 13
Provided by: cbsHan
Category:
Tags: cbs | homework

less

Transcript and Presenter's Notes

Title: Homework


1
Homework 1
  • Problem 1 Find CpG island in human chromosome
    21
  • The formal definition of CpG island is
  • It is longer than 200 symbols
  • The CG probability is 0.61/16.
  • GC content(GC out of ATGC) 50
  • ? Moving average problem.

2
  • Also, the CpG island should be maximal, in the
    sense that if we extend one nucleotide to the
    right or left, it is no longer a CpG island.
  • Two methods
  • Moving average.. Create a window and slide to the
    left or right
  • Divide the whole set into ,say, 10 units of nts.
    Then, calculate GC content, CpG probability, and
    sum up. Also, the consider the bordering nts.

3
  • You use the hs_chr21.fa on the lecture page.
  • Your output should contain the CpG location,
    including the contig number, starting and ending
    location in that contig.

4
  • Sample from chromosome 21 of human
  • gi27501117refNT_011512.7Hs21_11669 Homo
    sapiens chromosome 21 genomic contig
  • CATGTTTCCACTTACAGATCCTTCAAAAAGAGTGTTTCAAAACTGCTCTA
    TGAAAAGGAATGTTCAACTC
  • TGTGAGTTAAATAAAAGCATCAAAAAAAAGTTTCTGAGAATGCTTCTGTC
    TAGTTTTTATGTGAAGATAT
  • TTCCATTTTCTCTATAAGCCTCAAAGCTGTCCAAATGTCCACTTGCAGAT
    ACTACAAAAAGAGTGTTTCA
  • AAAGTGCTCAATGAAAAGGAATGTTCAGCTCTGTGAGTTAAATGCAAACA
    TCACAAATAAGTTTCTGAGA
  • ....
  • gi27486021refNT_011515.9Hs21_11672 Homo
    sapiens chromosome 21 genomic contig
  • AAGCTTCTCAATTTCAGAAATCTTCGGCAGCTTGGGGACATTCAAGGTCA
    CCCTGGGCTCCCAAAGTCAC
  • ACAATTCCATTGGCCACAGCCAAGTTTCCACGTCAGGAGGCTGTGG
    GCTGGGGGTGGCAGCACTGGGTCC
  • TGG
  • NT_0011512.7 is the contig number.

5
Problem 2
  • Your are given prosite.dat, containing pattern
    information, and swprot.fas, the swissprot
    database in plain file format. You should
    construct a perl program which outputs the
    proteins containing the pattern in the command
    line input. For example, if your program name is
    parse.pl and you want to find all proteins in
    swissprot database containing the patter PS00592
  • perl w parse.pl PS00592

6
  • Should print out the corresponding proteins.
  • The PS00592 entry in the prosite.dat
  • ID GLYCOSYL_HYDROL_F9_1 PATTERN.
  • AC PS00592
  • DT DEC-1991 (CREATED) DEC-1992 (DATA UPDATE)
    JUL-1998 (INFO UPDATE).
  • DE Glycosyl hydrolases family 9 active sites
    signature 1.
  • PA STV-x-LIVMFY-STV-x(2)-G-x-NKR-x(4)-P
    LIVM-H-x-R.
  • NR /RELEASE42.0,135850
  • NR /TOTAL18(18) /POSITIVE16(16)
    /UNKNOWN0(0) /FALSE_POS2(2)
  • NR /FALSE_NEG2 /PARTIAL1
  • CC /TAXO-RANGE??EP? /MAX-REPEAT1
  • CC /SITE11,active_site
  • DR P05522, GUN1_PERAE, T P23666, GUN2_PERAE,
    T P28622, GUN4_BACS5, T
  • DR P26221, GUN4_THEFU, T P22699, GUN6_DICDI,
    T P22534, GUNA_CALSA, T
  • DR P23665, GUNA_FIBSU, T P10476, GUNA_PSEFL,
    T P26225, GUNB_CELFI, T
  • DR P23658, GUNC_BUTFI, T P04954, GUND_CLOTM,
    T P26224, GUNF_CLOTM, T
  • DR P37700, GUNG_CLOCE, T Q02934, GUNI_CLOTM,
    T P23659, GUNZ_CLOSR, T
  • DR P22503, GUN_PHAVU , T
  • DR P38534, GUNX_PRUPE, P

7
  • AC accession number such as PS00592.
  • Detailed information can be found from the
    prosite database.

8
  • The pattern is
  • PA STV-x-LIVMFY-STV-x(2)-G-x-NKR-x(4)-P
    LIVM-H-x-R.
  • One of sample protein is
  • GUN4_BACS5 (P28622) Endoglucanase 4 precursor
    (EC 3.2.1.4) (Endo-1,4-b
  • MTRRWSFLVQCFTFKKKEGVRSRYMSDYNYVEVLQKSILFYEAQRSGKLP
    ESNRLNWRGD
  • SGLEDGKDVGHDLTGGWYDAGDHVKFGLPMAYSAAVLAWTVYEYREAYEE
    AELLDDMLDQ
  • IKWATDYFLKAHTGPNEFWAQVGDGNADHGWWGPAEVMPMNRPAFKIDEH
    CPGTEVAAQT
  • AAALAAGSIIFKETDAPYAAKLLTHAKQLYAFADQYRGEYTDCVTNAQPF
    YNSWSGYIDE
  • LIWGGIWLYLATNDQTYLNKALKAVEEWPKDWDYTFTMSWDNTFFLSQIL
    LARITKEKRF
  • IESTERNLDYWSTGFVQNGKVERITYTPGGLAWLDQWGSLRYTANAAFLA
    FVYADWVSDQ
  • EKKNRYQTFAIRQTHYMLGDNPQNRSYVVGFGKNPPMHPHHRTAHGSWSN
    QLTTPSSHRH
  • TLYGPLVGGPNRQDQYTDDISDYVSNEVATDYNAAFTGNGAAVWSGQSKL
    PNFPPKEKVE
  • DEFFVEAAVMSNDTTSTQIKAILYNRSGWPARSSQSLSFRYYVNLSEIFA
    KGFTDKDIQV
  • TAVYNEGASLSPLTVYDASSHIYFTEIDFTGVAIFPGGESLHKKEIQFRL
    SAPNGANIWD
  • ASNDYSFQGLTSNMQKTARIPVFDQGDLVFGTLPNK

9
(No Transcript)
10
Problem 2) List
11
PERL continued
  • Object
  • A class is a combination of variables and
    functions designed to emulate an object.

12
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com