Dr Richard White - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Dr Richard White

Description:

Analysing data, especially to test hypotheses (to understand biology better) 5 ... Computers allow the analysis of large data sets. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 28
Provided by: BRE69
Category:

less

Transcript and Presenter's Notes

Title: Dr Richard White


1
Dr Richard White
Basic Computing Concepts for Bioinformatics
2
Basic computing concepts
  • Basic Computing concepts sounds a bit scary. I
    hope youll find it isnt really.
  • Actually some of the Basic Computing Concepts
    youll be familiar with already.

3
What does the average biologist use computers for?
  • Browsing the Web, searching with Google, etc.
  • Email
  • Word-processing for reports, etc. (e.g. MS Word)
  • Data handling and simple statistics (e.g. Excel)
  • Playing CDs, games, etc. (no, of course not,
    only joking)
  • So youre probably quite experienced in computer
    use already
  • Databases? The use of bioinformatics databases
    figures prominently in this course.

4
What should biologists use computers for?
  • Access to biological databases, especially those
    containing bioinformatics information
  • Visualisation ways to understand data better by
    visual exploration
  • Analysing data, especially to test hypotheses (to
    understand biology better)

5
Computer use during this course
  • Mostly well be concerned with access to
    biological databases
  • Also some visualisation sessions and maybe some
    data analysis and hypothesis testing

6
Using predefined tools
  • Youll be doing a lot of this work using tools
    available on the web.
  • This makes life easy, because the hard work of
    setting these tools up for use has already been
    done by someone else.
  • However, sometimes its useful to get your hands
    dirty and mess about with the data and ways to
    process it yourself,
  • especially if you want to do something that
    zillions of other people havent already thought
    of.

7
Use of databases
  • Ill be running a session on the use of databases
    in week 4, but at the moment I want to think
    about this in order to discover some Basic
    Computing Concepts.
  • First, lets consider the characteristics of
    databases for a moment.

8
Simple database concepts
  • Computers allow the analysis of large data sets.
    These are frequently arranged as two-dimensional
    data tables, based on the convention that
  • each row holds information on a separate object
    (or abstract entity such as a species),
  • each column holds information on a particular
    property or characteristic of the objects,
  • in general there will be a single value in each
    cell of the table, representing the value of a
    specific characteristic for one particular
    object.

9
Spreadsheets
  • Data in the form of two-dimensional tables is
    frequently analysed using computer spreadsheet
    programs such as Microsoft Excel, especially
    where the purpose is
  • relatively simple data reorganisation,
  • summarisation,
  • statistical testing
  • report generation.

10
Databases
  • It is becoming harder to distinguish between
    spreadsheet and database programs.
  • Most databases require more than one table for
    example, one table may store data about proteins
    and another table stores data about the species
    these proteins are found in.
  • For more about database systems, see the
    PowerPoint presentation (DatabaseIntroduction.ppt)
    on my web site (see handout for details).

11
Methods for using databases
  • What methods exist to use databases?
  • Basically there are several approaches to the use
    of databases

12
Database use 1 direct access to database tables
  • Run your own database on your own computer (e.g.
    MS Access)
  • Use a program on your PC which gives you direct
    access to the tables in the remote database
    (client-server database access)
  • In both cases, you need instructions as to what
    the tables are and what they contain, such as
    SQL.

13
SQL statements
  • SQL (Structured Query Language) is a language
    for specifying the creation of databases and the
    updating and retrieval of information in them.
    It is general and portable so that it can be
    used with a variety of different database systems
    without having to learn a new language for each
    one.
  • The language goes far beyond this scope of this
    course. Briefly, it can be used to
  • Specify the tables in the database and the fields
    (columns) they contain
  • Make additions and updates to the data in those
    tables
  • Retrieve information from one or more of the
    tables

14
SQL for data retrieval
  • A typical SQL statement for data retrieval would
    look something like this
  • SELECT ltsome fieldsgt FROM lttablegt WHERE
    ltconditiongt
  • The condition effectively selects certain rows
    from the table.
  • Thus the result is often a smaller table than the
    one being queried.
  • Tables can be joined together to combine
    information from more than one table, for example
    when extracting a molecular sequence from one
    table and the bibliographic details of the
    reference to where it was published from another
    table.

15
Database use 2 predefined operations
  • Alternatively, you might have forms and queries
    already set up for you, which you can just run in
    order to perform predefined kinds of searches.
    These predefined operations can be made directly
    available to you by
  • Browsing a web page, typically containing a form,
    which gives you access NPI to a database
    somewhere else. Youve done this if youve ever
    bought anything on the Internet.
  • Using or even writing a small program (sometimes
    called a script to make it seem less scary) to
    fetch the data for you. This allows you to
    process the data in useful ways
  • to search for features youre interested in,
  • to summarise the data in the way you want, or
  • to extract data for statistical analysis to test
    hypotheses.

16
Database use 3 using predefined operations
  • The predefined operations may be packaged as CGI
    programs or Web Services or in a variety of other
    ways, but basically you just send a request to
    the service, optionally with some parameters to
    specify what you want, and wait for the reply.
  • The reply may come back, usually,
  • in HTML (as a web page containing the data
    requested) or
  • as some other sort of file to be downloaded (i.e.
    stored on your PC), either
  • in one of a number of formats invented by the
    data providers,
  • in XML, a standard but flexible (and verbose) way
    to structure a data file, so that other programs
    (rather than humans) can process it easily.

17
Overview of NCBI Entrez
  • In a later session, youll be introduced to a
    number of bioinformatics databases, but its
    worth spending a moment looking at a popular way
    to make use of some of them, because you will
    explore this in Practical 2 in week 4 of this
    course.
  • NCBI web site
  • Entrez utilities

18
Brief introduction to Perl programming
  • (What? In ten minutes??)
  • This will help you prepare for Practical 2 (the
    practical part of the 4th week of the course), in
    which we shall use simple Perl programs to
    request data from a bioinformatics information
    provider such as NCBI, by connecting with their
    Entrez utilities. (Additional Perl tutorial
    material may be made available.)
  • What is a Perl program? (or script)
  • How to run one
  • How to write one
  • What do you need? See the handout

19
A computer program
  • A program is a set of instructions to the
    computer, such as
  • Get input from user
  • Perform calculation
  • Display window
  • React to mouse click
  • These are instructions at a very high level.
    They need to be broken down into smaller details.
    A program consists of combinations of
  • Sequences of instructions (statements)
  • Repetitions (to execute statements repeatedly)
  • Selections (to choose which statements to
    execute)
  • Functions (subroutines or methods groups of
    instructions)

20
A simple program
  • Here is a simple Perl program.
  • !/usr/local/bin/perl
  • Program to do the obvious
  • print 'Hello world.'
  • The first line every Perl program starts off
    with this as its very first line, although it may
    vary from system to system, or not be used at
    all. It tells the machine what to do with the
    file when it is executed (it tells it to run the
    file through the Perl software to execute it).
  • Everything which is not a comment is a Perl
    statement which must end with a semicolon, like
    the last line above.
  • So the next thing to do is to run it.

21
Running the program
  • Type in the example program using a text editor,
    and save it in a file called something.pl.
  • Now to run the program just type the following at
    the Command Prompt.
  • perl something.pl
  • If something goes wrong then you may get error
    messages, or you may get nothing at all.

22
Perl programming concepts variables
  • Variables can hold both strings and numbers. For
    example, the statement
  • priority 9
  • sets the scalar variable priority to 9, but you
    can also assign a string to exactly the same
    variable
  • priority 'high'
  • In general variable names consists of numbers,
    letters and underscores, but they should not
    start with a number. Perl is case sensitive, so
    a and A are different variables.

23
Operations and Assignment
  • Perl uses all the usual arithmetic operators
  • a 1 2 Add 1 and 2 and store in a
  • a 3 - 4 Subtract 4 from 3 and store in a
  • a 5 6 Multiply 5 and 6
  • a 7 / 8 Divide 7 by 8 to give 0.875
  • etc.
  • and for strings Perl has the following among
    others
  • a b . c Concatenate b and c

24
Array variables
  • A slightly more interesting kind of variable is
    the array variable which is a list of scalars
    (single values, i.e. numbers and strings). Array
    variables have the same format as scalar
    variables except that they are prefixed by an _at_
    symbol. The statement
  • _at_food ("apples", "pears", "eels")
  • assigns a three element list to the array
    variable _at_food.
  • The array is accessed by using indices starting
    from 0, and square brackets are used to specify
    the index. The expression
  • food2
  • returns eels. Notice that the _at_ has changed to a
    because food2 and eels are scalars, not
    arrays.

25
File handling
  • Here is a basic Perl program which does the same
    as the UNIX cat or Dos/Windows type command on a
    certain file.
  • !/usr/local/bin/perl
  • Program to open the password file, read it in,
  • print it, and close it again.
  • file '/etc/passwd' Name the file
  • open(INFO, file) Open the file
  • _at_lines ltINFOgt Read it into an array
  • close(INFO) Close the file
  • print _at_lines Print the array

26
Control structures
  • Perl supports lots of different kinds of control
    structures. Have a look at the Perl resources
    listed on the handout. Most Perl programs use
    these features.
  • Programs can make choose between alternative
    branches
  • Programs can repeat statements until something
    happens
  • Frequently used statements to carry out some
    common task can be made into a subroutine or
    function and called from others part of the
    program

27
End
Write a Comment
User Comments (0)
About PowerShow.com