Research Techniques - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Research Techniques

Description:

The Department of Economic and Management Sciences Prof. Andy Mauromoustakos Analysis of multivariate Data using JMP Heavy metals in marine sediments, differences ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 19
Provided by: Gue2399
Category:

less

Transcript and Presenter's Notes

Title: Research Techniques


1
Prof. Andy Mauromoustakos
Presented To
Research Techniques Probability Theory
Analysis of multivariate Data using JMP Heavy
metals in marine sediments, differences between
locations
Presented by Tareq Altamimi econ0306
2
Slides Index
  • Presentation overview.
  • MANOVA Analysis.
  • Why MANOVA.
  • Analysis Results.
  • Conclusion.
  • JMP Analysis.
  • References.
  • MANOVA Tests.
  • Database Overview Objectives.

Click INDEX to return back here or Enter to go to
the next Slide
  • Database Variables definition.
  • Correlation between Variables.
  • Repeated measures Response.
  • Principle components.

3
Presentation Overview
  • This presentation is a description of an
    important statistical subject which is
    multivariate analysis using JMP program. The
    analysis in this presentation is done for a
    database collected by scientists in the
    University of Melbourne at the department of
    biology. The analysis is mainly about heavy
    metals in marine sediments, differences between
    llocations. We take three different locations
    with four observations(experiments) each.
  • JMP make it easy to see the results of this
    changes in the heavy metals in marine sediments
    in an easy graphically presented data. This
    graphics is called charts, these chars can be
    translated easily by statisticians. This
    presentation also will include a translation of
    the results of analysis in three different
    locations which this database was tested on.
  • The Analysis in this project is done on
    multivariate analysis on variance (MANOVA) which
    is used to see the main and interaction effects
    of categorical variables on multiple dependent
    interval variables. MANOVA uses one or more
    categorical independents as predictors. MANOVA
    tests the differences in the centred (vector) of
    means of the multiple interval dependents, for
    various categories of the independent(s). There
    are multiple potential purposes for MANOVA. So
    all of the analysis here will be done by MANOVA
    using JMP Program.

INDEX
4
Why MANOVA, why Repeated Measures?
Why JMP?
  • JMP dynamically links statistics and graphics so
    you can easily explore data, make discoveries,
    and gain the knowledge you need to make better
    decisions. Click on a point in a graph to
    highlight the corresponding observation
    everywhere it is represented in JMP in other
    graphs, in 3-D spinning plots, and in the data
    tables. JMP provides a comprehensive set of
    statistical tools as well as Design of
    Experiments (DOE) and advanced quality control
    (QC and SPC) tools for Six Sigma in a single
    package. Advanced modelling techniques include
    ANOVA and MANOVA, stepwise, log linear, ordinal
    logistic regression, survival/reliability, true
    non-linear modelling, partitioning (decision
    trees), neural networks, time series
    multivariate, cluster, discriminant, and partial
    least squares (PLS). The JMP Scripting Language
    (JSL) lets you capture the results of your work
    in automatically-generated scripts, and offers
    all the power of a programming language, complete
    with matrix algebra support, so you can create
    custom analyses, interactive graphics, and more.
  • Multivariate statistics were developed to handle
    situations where multiple variables or measures
    are involved. Any analysis of more than two
    variables or measures can loosely be considered a
    multivariate statistical analysis or multivariate
    analysis. One of the primary goals of
    multivariate statistical analysis is to describe
    the relationships among a set of variables. The
    multivariate analysis is widely used in various
    fields, such as agriculture, food and life
    sciences, business and engineering and so on.
    Repeated measures analysis (also called
    longitudinal data when repeated measurements are
    taken on each subject and you want to analyse
    effects both between subjects and within subjects
    across the measurements. This multivariate
    approach is especially important when the
    correlation structure across the measurements is
    arbitrary.

INDEX
5
The JMP way of doing things is best summarised in
the following four points
  • Variables are assigned to one of the three levels
    of measurement nominal, ordinal or continuous.
    This assignment is under user control and may be
    changed at will. It is used to allow JMP to
    decide what summary statistics to provide and
    what techniques are suitable for analysis using
    the variable.
  • In any analysis, each variable may be assigned
    one of the roles X, Y, weight, frequency or
    label.
  • Using a combination of the above two pieces of
    information, JMP is able to decide on an
    appropriate analysis if the user chooses one of
    the following activities, referred to as
    'personalities'
  • Distribution of Y for single variable summaries
    and plots.
  • Fit Y by X for one response and one explanatory
    variable, the techniques employed are ANOVA, LS
    regression, logistic regression or contingency
    table analysis depending on the levels of the X
    and Y variables.
  • Fit Model for variable numbers of responses and
    explanatory variables, under this heading a range
    of techniques are available including ANOVA,
    ANCOVA, MANOVA, LS regression and stepwise
    procedures, logistic and ordinal regression,
    log-linear models, proportional hazard models,
    screening models and D-optimal designs.
  • Non-linear Fit for non-linear models specified by
    the user and fit using Gauss-Newton or
    Newton-Raphson and with either one of a range of
    built in loss functions or one specified by the
    user.
  • Correlation of Ys for examination of the
    correlation or covariance structure of a set of
    variables, including scatter plot matrices, PCA
    and factor analysis.
  • Cluster for cluster analysis using hierarchical
    and K- means approaches.
  • Survival for survival analysis using
    Kaplan-Meier, Cox regression and non-linear
    survival models.
  • Most of the analysis 'personalities' produce
    graphs, many of them dynamic, as part of their
    standard output. There are always a range of
    additional outputs both textual and graphical
    available from analyses.

INDEX
6
MANOVA Tests
  • There are multiple potential purposes for MANOVA.
  • To compare groups formed by categorical
    independent variables on group differences in a
    set of interval dependent variables.
  • To use lack of difference for a set of dependent
    variables as a criterion for reducing a set of
    independent variables to a smaller, more easily
    modeled number of variables.
  • To identify the independent variables which
    differentiate a set of dependent variables the
    most.
  • MANOVA has four main tests described as
    following
  • Hotelling's T-Square is the most common,
    traditional test where there are two groups
    formed by the independent variables. Note one may
    see the related statistic, Hotelling's Trace
    (a.k.a. Lawley-Hotelling or Hotelling-Lawley
    Trace). To convert from the Trace coefficient to
    the T-Square coefficient, multiply the Trace
    coefficient by (N-g), where N is the sample size
    across all groups and g is the number of groups.
    The T-Square result will still have the same F
    value, degrees of freedom, and significance level
    as the Trace coefficient.
  • Wilks' lambda, U. This is the most common,
    traditional test where there are more than two
    groups formed by the independent variables. It is
    a measure of the difference between groups of the
    centroid (vector) of means on the independent
    variables. The smaller the lambda, the greater
    the differences. The Bartlett's V transformation
    of lambda is then used to compute the
    significance of lambda. Wilks's lambda is used,
    in conjunction with Bartlett's V, as a
    multivariate significance test of mean
    differences in MANOVA, for the case of multiple
    interval dependents and multiple (gt2) groups
    formed by the independent(s). The t-test,
    Hotelling's T, and the F test are special cases
    of Wilks's lambda.
  • Pillai-Bartlett trace, V. Multiple discriminant
    analysis (MDA) is the part of MANOVA where
    canonical roots are calculated. Each significant
    root is a dimension on which the vector of group
    means is differentiated. The Pillai-Bartlett
    trace is the sum of explained variances on the
    discriminant variates, which are the variables
    which are computed based on the canonical
    coefficients for a given root. Olson (1976) found
    V to be the most robust of the four tests and is
    sometimes preferred for this reason.
  • Roy's greatest characteristic root (GCR) is
    similar to the Pillai-Bartlett trace but is based
    only on the first (and hence most important)
    root. Specifically, let lambda be the largest
    eigen value, then GCR lambda/(1 lambda).

INDEX
7
Heavy metals in marine sediments, differences
between locations -
The University of Melbourne is doing a research
on Marine sediments. The main aim is to discuss
the heavy metals in this marine sediments if the
location is changed, is there any changes in the
levels of these sediments?. The answer of this
problem will be discuss in this database study on
this presentation. This analysis will be done
using JMP software.
Research objectives
One objective for this particular study was to
determine if there is any differences of the
percentage of heavy metals in marine sediments on
different locations. Treatment Design The
treatments included the quantities of cooper(CU),
lead(PB), nickel(NI) and manganese(MN). In every
location we get 4 different samples and we made
the experiment on it. We did this in the four
location which are Delray, Seaspray and
Woodside. Experiment Design Samples of marine
sediments were randomly assigned to the four
treatments in a completely randomized design. The
treated marine sediments samples were placed in
airtight containers and incubated under
conditions conducive to microbial activity. This
experiment was done to discover it there is
differences in the percentage of heavy metals in
marine sediments if the sample is taken from
different locations?, is there any effect of the
location in these percentages? These differences
can be measured by measuring the three main heavy
metals in it. The heavy measures were measured in
three different locations. In each location we
have four different samples to make sure that we
have reached to a general conclusion about this
kind of amazing marine sediments. The heavy metal
quantity in each marine sediment sample was
recorded on an idealized experiment area. The
data is already shown in the previous slides and
a profile plot from the Fit Models MANOVA
personality is shown on the right.
INDEX
8
Database Variables Definition
  • Site the first Variable in this database is
    the site which is the place where the experiment
    had done. In this database we have three
    different sites , which is (Delray beach, Sea
    spray, Woodside). This Variable is important
    because it divides the data in three different
    group depending on the area the experiment had
    been done.The following variable are continuous
    as CU,PB,NI and MN are dependant variables, this
    is considered as one of the main conditions of
    MANOVA.
  • Factor Variable
  • Site It is having values of the area where the
    marine sediments had already taken from.
  • Responses
  • CU is the concentration of Copper in marine
    sediment, on every site we have four different
    tests.
  • PB This Variable represent is the
    concentration of lead in Marine sediment.
  • NI It is the concentration of nickel in
    marine sediment.
  • MN It is the concentration of manganese in
    marine Sediment.
  • This last four Variables are considered the
    most important variables because it will show to
    us the differences or the similarity of the
    marine sediments in three different sites. Some
    other Variable used in this database is the log10
    transformation of the previous mentioned
    variables, in addition to the log10
    transformation of FE
  • LCU - log10 transformation of CU LPB -
    log10 transformation of PB
  • LNI - log10 transformation of NI
    LMN - log10 transformation of MN
  • LFE log10 transformation of FE
  • This data which we have is eligible to the
    MANOVA conditions as we have continuous dependant
    variable and categorical independent variable.

NEXT
9
Database of heavy metals in marine sediments,
differences between locations.
SITE - sites from which data were collected
(Delray Beach, Seaspray,Woodside)
CU - concentration of copper PB -
concentration of lead NI - concentration of
nickel MN - concentration of manganese LCU
- log10 transformation of CU LPB - log10
transformation of PB LNI - log10
transformation of NI LMN - log10
transformation of MN LFE
log10 transformation of FE
Sea spray
Woodside
Delray
Marine sediments
NEXT
10
My models (scripts)
This picture showing us how the data look like at
the JMP program during the analysis.
The database in JMP program
To know what does this mean look at the previous
page in the previous page
INDEX
11
Correlation between variables
The table behind is called the Correlations
table, which is a matrix of correlation
coefficients that summarizes the strength of the
linear relationships between the each response
which are the heavy metal variables (CU,PB,NI,
MN, LCU,LFE,LNI,LMN). The scatter plot matrix on
the left showing that there is a relation between
the different items in the data so it shows that
the data is correlated, this correlation is not a
high correlation but it is correlated. If we get
the LFE variable with the LNI it shows a strong
relationship between both in the plot.
INDEX
12
Description of how some analysis work in JMP.
  • In MANOVA analysis you can select a response
    design that indicates whether you want to use the
    response variables individually or in some linear
    combination. JMP like most software supports
    several response designs, but it also allows you
    to build your own. Included designs
  • Rep. Measures Automatic analysis of
    repeated measures design.
  • This is the way used in this database analysis.
  • Sum the sum of the responses, one value
  • Identity each response, the identity matrix (no
    transformation MI)
  • Contrast each response (except the first) minus
    the first
  • Polynomial orthogonal polynomials
  • Helmert each response versus the ones after it,
    except the last
  • Profile each response versus all others, except
    the last
  • Mean each response versus the mean of the
    others, except the last
  • Compound for response forming a compound of more
    than one effect
  • Custom any M matrix you want to enter a d edit
    yourself.

INDEX
13
This third column shows the cumulative percent of
variation represented by the eigenvalues. The
first three principal components account for
93.6277 of the variation in the sample.
The Spinning Plot platform displays a
three-dimensional spinnable plot.
INDEX
14
Analysis with MANOVA
  • Least Squares Means Report
  • This graph give us for each pure nominal effect,
    the overall least squares means of all the heavy
    metals and their log transformations and profile
    plots of the means. Shows the profile plot of the
    metal and their transformations logs and the
    table of least squares means.
  • The second graph is showing to us every site mean
    so it dividing to us the over all means depending
    on the location where every sample is gotten
    from. It also include the table of least square
    means organized by sites (locations). From here
    we can see that the least square mean of the
    Woodside area is more than the Seapray and the
    Seapray results is also more than Delray. This
    also can be noticed from the table of least
    squares of every variable described under this
    graph.

NEXT
15
Partial covariance and correlation tables
The Partial Correlation table here shows the
covariance matrix and the partial correlation
matrix of residuals from the initial fit. The
partial correlation table shows the partial
correlations of each pair of variables after
adjusting for all the other variables so we can
notice how its designed to make relations between
every heavy metal here. Notice that the diagonal
is 1 always in the partial correlation.
  • The main ingredients of multivariate tests are
    the E and the H matrices
  • The elements of the E matrix are the cross
    products of the residuals.
    E H meaning click here
  • The H matrices correspond to hypothesis sums of
    squares and cross products.
  • There is an H matrix for the whole model and for
    each effect in the model. Diagonal elements of
    the E and H matrices correspond to the hypothesis
    (numerator) and error (denominator) sum of
    squares for the univariate F tests. New E and H
    matrices for any given response design are formed
    from these initial matrices, and the multivariate
    test statistics are computed from them.

INDEX
16
The MANOVA Analysis In this MANOVA analysis we
choose to use the Repeated measures response
because the data has several observations in
every site we have. If we look to the F-test we
can notice that its intercept test probgtf
0.0001 which is less than 0.05(a). From the main
principals of the Multivariate tests we will
notice that in Roy's max root test probgtf is
0.0353 which is less than 0.05. From here we can
understand that there is a difference in the
level of heavy metals in marine sediments when we
change the location where the sample is gotten
from. The result of this test improve that
the level of heavy metals is varied whenever we
change the location.

INDEX
17
Conclusion
  • After applying a statistics analysis to the data
    of marine sediments in three different locations
    (Woodside,Sea spray,Delray) we discover that the
    heavy metals levels in the marine sediments
    depend rationally on the area where these marine
    sediments is located.

INDEX
18
References
Experimental Design and Data Analysis for
Biologists Gerry Quinn Mick Keough Chapter 16
Multivariate analysis of variance and
discriminant analysis Published by Cambridge
University Press 2002.
http//www.jmp.com/ other alternative
website.
PowerPoint presentation on Marketing Research
Part B Continuous Data Applications Multivariate
Analysis and other PDF and printed material.
Dr. Andy Mauromoustakos
JMP version 5 help.
INDEX
The database is a study of University of
Melbourne.
Home
Write a Comment
User Comments (0)
About PowerShow.com