Basic principles of probability theory - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Basic principles of probability theory

Description:

... data and the y axis is number of occurrences. Simple plots: qqplot ... runif is a random number generator from the uniform distribution. It is a useful command. ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 19
Provided by: gar115
Category:

less

Transcript and Presenter's Notes

Title: Basic principles of probability theory


1
Name Garib Murshudov (when asking questions
Garib is sufficient) e-mail garib_at_ysbl.york.ac.u
k location Bioscience Building (New Biology),
K066 webpage for lecture notes and
exercises www.ysbl.york.ac.uk/garib/mres_course/2
006/ There will be two types of exercises With
numbers. They will be marked.
With
names. You can do them and I will mark
them. You can send questions about this course
and other questions I can help with to the above
e-mail address.
2
Additional materials
  • Linear and matrix algebra
  • Eigenvalue/eigenvector decomposition
  • Singular value decomposition
  • Operation on matrices and vectors
  • Basics of probabilities and statistics
  • Probability concept
  • Characterstic/moment generating/cumulative
    generating functions
  • Entropy and maximum entropy
  • Some standard distributions (e.g. normal, t, F,
    chisq distributions)
  • Point and interval estimation
  • Elements of hypothesis testing
  • Sampling and sampling distributions
  • Optimisation techniques
  • Gradient methods
  • Super-linear and second order techniques

3
Introduction to R
  • Example of analysis in this course will be done
    using R. You can use any package you are familiar
    with. However I may not be able to help in these
    cases.
  • R is a multipurpose statistical package. It is
    freely available from
  • http//www.r-project.org/
  • Or just type R on your google search. The first
    hit is usually hyperlink to R.
  • It should be straightforward to download.
  • R is an environment (in unix terminology it is
    some sort of shell) that offers from simple
    calculation to sophisticated statistical
    functions.
  • You can run programs available in R or write your
    own script using these programs. Or you can also
    write a program using your favourite language
    (C,C,FORTRAN) and put it in R.
  • If you have a mind of a programmer then it is
    perfect for you. If you have a mind of a user it
    gives you very good options to do what you want
    to do.
  • Here I give a very brief introduction to some of
    the commands of R. During the course I will give
    some other useful commands for each technique.

4
To get started
  • If you are using Windows Once you have
    downloaded R (the University has already that)
    then you can either follow the path
    Start/Programs/R or if you have a shortcut to R
    version double click that icon. Then you will
    have R window
  • If you are using unix/linux/MacOS/ After
    defining path where R executables are just type R
    in one of your windows. Usually path is defined
    during download time.
  • Useful commands for beginners
  • help.start()
  • will usually start a web browser and you can
    start learning. A very useful section is An
    Introduction to R. There is a search engine
    also.
  • To get information about a command you just type
  • ?command
  • It will give some sort of help (sometimes helpful
    help).
  • command()
  • Gives R script if available. Reading these
    scripts may help you to write your own script or
    program

5
Simple commands assignment
  • The simplest command is that of assignment
  • v5.0
  • or
  • v lt- 5.0
  • the value of the variable v will become 5.0
    (Although there are several ways for assignment I
    will always use )
  • If you type
  • v c(1.0,2.0,10.0,1.5,2.5,6.5)
  • will make a vector with length 6.
  • if you type
  • v
  • R will print the value(s) of the variable v.
  • vc(mine,yours,his/hers,theirs,its)
  • will create a vector of characters. Type of
    variable is defined on fly.
  • To access particular value of a vector use for
    example
  • v1 the first element

6
To create a matrix
  • The simplest way to create a matrix is to create
    a vector then convert it to a matrix
  • c vector(len100)
  • c1100 (The values of c will become integers
    from 1 to 100)
  • dim(c ) c(5,20)
  • c
  • The second command will work whenever you have a
    vector. The resulting c will be a matrix with
    dimensions 5x20.
  • You can also use
  • d matrix(c,c(5,20)) or d matrix(c,nrow5)
    or dmatrix(c,ncol20)
  • d
  • then c will be kept intact and d will become a
    matrix. You can also give names to the columns
    and rows (LETTERS is a built in vector of the
    English letters)
  • rownames(d) LETTERS15
  • colnames(d) LETTERS120

7
Simple calculations arithmetic
  • Almost all elementary functions are available
  • exp(v)
  • log(v)
  • tan(v)
  • cos(v) and others
  • These functions are applied to all elements of
    the vector (or matrix). Types of the value of
    these function are the same as the types of the
    arguments. It will of course fail if v is a
    vector of characters and you are trying to use a
    function with real argument or the values are
    outside of the range of functions argument
    space.
  • Apart from elementary functions there are many
    built in special functions like Bessel functions
    (besselI(x,n), besselK(x,n) etc), gamma functions
    and many others. Just have a look help.start()
    and use Search engine and Keywords

8
Two more commands for sorting
  • There are two commands for sorting. One of them
    is
  • sort(randu,1)
  • It just sorts the data in an ascending order. It
    has a limited use. Another, more important one
    does not sort but creates a vector of indices
    that corresponds to a sorted data. That is
  • order(randu,1)
  • It gives position of the ordered data. It can
    now be used to access data in an ordered form.
    sort(data) and dataorder(data) are equivalent.
  • randuorder(randu,1),
  • will change rows of the data so that the first
    column is sorted..

9
Reading from files
  • The simplest way of reading from a file of table
    is to use
  • d read.table(name of the file)
  • It will read that table from the file (you may
    have some problems if you are using windows). Do
    not forget to put end of line for the final line
    if you are using windows.
  • scan is also a useful command for reading.
  • d scan(filename of the file)
  • There are options to read files from various stat
    packages. For example read.csv, read.csv2

10
Built in data
  • R has numerous built in datasets. You can view
    them using
  • data()
  • You can pick one of them and play with it. It is
    always good idea to have a look what kind of
    data you are working with. There are also helps
    for R datasets
  • data(DNase)
  • ?DNase
  • It will print information about DNase.
  • You can have all available data sets using
  • data(package .packages(all.available TRUE))
  • To take a data set from another package you can
    load the corresponding library using
  • library(name of library)
  • and then you can read data set. This command will
    load all functions in that library also
  • Once you have data you can start analysing them

11
Simple statistics
  • The simplest statistics you can use are mean,
    variance and standard deviations
  • data(randu)
  • mean(randu,2)
  • var(randu,2)
  • sd(randu,2)
  • will calculate mean, variance and standard
    deviation of the column 2 of the data randu
  • Another useful command is
  • summary(randu,2)
  • It gives minimum, 1st quartile, median, mean, 3rd
    quartile and maximum values

12
Simple two sample statistics
  • Covariance between two samples
  • cov(randu,1,randu,2)
  • Correlation between two samples
  • cor(randu,1,randu,2)
  • When you have a matrix (columns are variables and
    rows are observations)
  • cov(randu)
  • will calculate covariance between columns
  • cor(randu)
  • will calculate correlation between columns
  • If rows are observations then you can use the
    transpose of the matrix
  • cov(t(randu))

13
Simple plots
  • There are several useful plot functions. We will
    learn some of them during the course. Here are
    the simplest ones
  • plot(randu,2)
  • Plots values vs indices. The x axis is index of a
    data point and the y axis is its value

14
Simple plots boxplot
  • Another useful plot is boxplot.
  • boxplot(randu,2)
  • It produces a boxplot. It is a useful plot that
    may show extreme outliers and overall behaviour
    of the data under consideration. It plots median,
    1st, 3rd quantiles, minimum and maximum values.
    In some sense it a graphical representation of
    command summary

15
Simple plots histogram
  • Histogram is another useful command. It may give
    some idea about the underlying distribution
  • hist(randu,2)
  • will plot histogram. x axis is value of the data
    and the y axis is number of occurrences

16
Simple plots qqplot
  • Useful way of checking if data obey a particular
    distribution
  • qqnorm(randu,2)
  • is useful to see if the distribution is normal.
    It must be linear. Clearly it is not normal

17
Simple qqplot
  • Let us test another one. Uniform distribution
  • qqplot(randu,2,runif(1000))
  • runif is a random number generator from the
    uniform distribution. It is a useful command.
  • The result is (It looks much better)

18
Further reading
  • Introduction to R from package R
  • Dalgaard, P. Introductory Statistics with R
Write a Comment
User Comments (0)
About PowerShow.com