Automatic Information Retrieval from Bioinformatics Websites - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Automatic Information Retrieval from Bioinformatics Websites

Description:

SwissProt & TrEMBL (http://us.expasy.org/sprot/) SCOP (http://scop.mrc-lmb.cam.ac.uk/scop ... NCBI BLAST (http://www.ncbi.nlm.nih.gov/BLAST/) PSIPRED (http: ... – PowerPoint PPT presentation

Number of Views:210
Avg rating:3.0/5.0
Slides: 14
Provided by: pk92
Category:

less

Transcript and Presenter's Notes

Title: Automatic Information Retrieval from Bioinformatics Websites


1
Automatic Information Retrieval from
Bioinformatics Websites
  • Kang Peng

2
Introduction
  • Bioinformatics Databases
  • SwissProt TrEMBL (http//us.expasy.org/sprot/)
  • SCOP (http//scop.mrc-lmb.cam.ac.uk/scop/)
  • InterPro (http//www.ebi.ac.uk/interpro/index.html
    )
  • Bioinformatics Sequence analysis tools
  • NCBI BLAST (http//www.ncbi.nlm.nih.gov/BLAST/)
  • PSIPRED (http//bioinf.cs.ucl.ac.uk/psipred/)
  • MEME(http//meme.sdsc.edu/meme/website/meme.html)
  • Integrated Sites
  • NCBI (http//www.ncbi.nlm.nih.gov/)
  • EBI (http//www.ebi.ac.uk/index.html)

3
The Problem
  • How to retrieval information automatically or in
    batch mode?
  • Retrieve 10,000 protein sequences from SwissProt
    based on the access IDs
  • Make secondary structure predictions for 10,000
    protein sequences through the PSIPRED site
  • Find known InterPro patterns(motifs) in the
    10,000 protein sequences through the InterPro
    site
  • Some websites has provided mechanisms for batch
    retrieval, but not all

4
Solution
  • First check if the website has already provided
    what you want!!!
  • If so, why not use it??
  • Study how the web browser interacts with the
    server
  • Write a program that simulates the web browser to
    communicate with the web server.
  • Now we have full control, so we can do whatever
    we want

5
The World Wide Web
6
The World Wide Web
7
Common Gateway Interface (CGI)
  • A CGI program
  • Runs on the web server
  • Takes inputs by web browser user
  • Can query DBs, run sequence analysis tools, etc.
  • Can convert its output into HTML files
  • Can be written in ASP, JSP, PHP, Perl, even C

8
HTML FORM
  • To input parameters and/or data for CGI programs
  • Parameters/data is encoded and sent
  • As namevalue pairs
  • In the URL GET method
  • example http//us.expasy.org/cgi-bin/prosite-s
    earch-ful
  • In HTTP request body - POST method
  • example http//us.expasy.org/tools/protparam.
    html
  • Lets examine some examples now

9
Automatic Information Retrieval
  • We need identify following from the FORM section
    in the web page (by reading the HTML source file)
  • The URL of the CGI program
  • Data/parameters for the CGI program
  • Data encoding and request method in URL (GET) or
    in HTTP request body (POST)?

10
Automatic Information Retrieval
  • Figure out the interaction process between the
    browser and the server it could take several
    steps!
  • An example retrieving the secondary structure
    information for a protein in PDB
    (http//www.rcsb.org/pdb/)
  • Now we can write a program that simulates the web
    browser to communicate with the web server.

11
Implementation
  • Any languages that provide convenient programming
    interfaces to the HTTP protocol
  • Visual Basic Winsock control, Internet Transfer
    Control
  • Visual C WinInet API (for C), WinInet MFC
    classes (for C)
  • Java java.net.URLConnection, java.net.URL
  • Linux/Unix shell lynx
  • Perl ???

12
Examples
  • Please check my webpage at
  • http//astro.temple.edu/kangpeng
  • (Not available until Monday)

13
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com