Methodology II - PowerPoint PPT Presentation

1 / 105
About This Presentation
Title:

Methodology II

Description:

By definition, a population is the entire group of interest in a research study ... respondent reading skills can cause levity, insult, or even resentment reading ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 106
Provided by: edinboroun
Category:

less

Transcript and Presenter's Notes

Title: Methodology II


1
Methodology II
2
Sampling/Data Collection
  • Research is almost always directed at
    characterizing and understanding a segment of the
    world, a population, on the basis of observing a
    smaller segment, or sample.
  • By definition, a population is the entire group
    of interest in a research study (e.g., all
    learning disabled children in the U.S.).
  • Populations are defined, not by nature, but by
    rules of membership invented by investigators
    (e.g., all children with mental handicaps
    currently residing in NW PA).

3
Sampling/Data Collection
  • In contrast, a sample is some subset of a
    population.
  • It is a select group from the population chosen
    to represent this population.
  • A sample can be any size, as long as it consists
    of a number less than the total number of the
    population being observed.
  • The most accurate information about a population
    will come from a sample that is representative of
    the population from which it is selected.

4
Sampling/Data Collection
  • In order to get an accurate picture of the
    population as a whole, all of its characteristics
    must be represented in the sample in appropriate
    proportions.
  • A sample is said to be biased when it is not
    representative of the entire population to which
    an investigator wants to generalize.
  • A representative sample is accomplished through
    sampling methods.

5
Random Sampling
  • A sample is random when
  • Every member of the population has an equal
    chance of being selected to be in the sample and
  • The selection of any one member of the population
    does not influence the chances of selecting any
    other member.
  • One very simple way to obtain a random sample is
    to put the names or code numbers of all members
    of the population into a hat, shake them up, and
    without looking, draw out enough for your sample.

6
Random Sampling
  • This is usually the way winning lottery tickets
    are selected and the procedure gives each ticket
    an equal chance of winning.
  • Another procedure frequently used to obtain a
    random sample is a Table of Random Numbers.
  • The numbers in the table are generated by a
    computer so that every digit is as likely to
    appear as every other.
  • If you want to select a sample of 100 cases from
    a population of 500, you would first assign every
    member of the population a number.

7
Random Sampling
  • Then you would enter the table at any point, and
    then read the numbers until you had found 100
    numbers between the values of 1 and 500.
  • Any numbers that you encounter higher than 500
    you would ignore.
  • Random sampling does not eliminate all
    possibility of error, but it does guard against
    any systematic bias slipping into the selection
    of the sample.

8
Systematic Random Sampling
  • A slight variation on simple random sampling is
    systematic random sampling.
  • Subjects are selected from a population listing
    (e.g., a phone book) in a systematic way (e.g.
    every 10th name).
  • This method is fast and easy, but it is accurate
    only if the listing of the population is not
    biased in any way (i.e., against people without
    phones or with unlisted numbers).

9
Stratified Sampling
  • Stratified sampling involves a family of
    sophisticated sampling techniques.
  • These methods use known characteristics of the
    population and a sample is selected
    (proportionate or disproportionate) based upon
    these known characteristics.
  • Say for example, you had a population of 100
    people 70 were male 30 were female.
  • If you took a stratified portion sample of 10,
    you would have 7 males and 3 females in you
    sample.

10
Stratified Sampling
  • The variable of gender would be the stratifying
    variable upon which the sample is proportionately
    controlled.
  • Proper representation of the sample to the
    population on this stratifying variable has now
    been ensured.
  • Complex strata can be simultaneously controlled
    in any one sample.
  • If the strata of income, age, and gender for a
    population is already known, a sample can be
    selected to ensure proportional representation on
    all three of these strata simultaneously.

11
Stratified Sampling
  • In rough terms, statistical tests require about
    30 subjects for group category analysis.
  • When using a multistage stratified sampling
    method to represent a large population, the
    number of subjects in each category of each
    strata must still be at or around a minimum of
    30.
  • So, for the stratifying variable of gender, a
    minimum of 60 total subjects would be required
    30 males and 30 females.
  • The most common stratified sampling error
    involves the use of too many stratifying
    variables on inadequate sample sizes.

12
Sample Size Determination
  • A common sampling question is how many subjects
    need to be sampled to accurately reflect the
    population?
  • There are numerous statistical and
    non-statistical approaches to this issue.
  • Unfortunately, the statistical processes for
    estimating sample size are not used frequently in
    professional research.
  • Sample sizes are usually derived from similar
    studies, advisor recommendations, or the
    researchers own common sense.
  • The nature of the study often dictates specific
    sample size.

13
Sample Size Determination
  • For example, in aphasiology, size is usually
    determined by the availability of the subjects
    usually 15 subjects per group (e.g., left CVAs,
    right CVAs, and normals) are recommended.
  • When collecting data on normals (e.g., African
    American adults), 50 members per subject grouping
    has been recommended, with a minimum of 10
    members per stratified variable (e.g., SES).
  • All initial sample size estimates should be
    adjusted upward to compensate for attrition
    (e.g., subject drop out), subject refusal to
    participate, or other similar circumstances.

14
Data Collection Methods
  • Once the sample is determined, one needs to
    consider how to collect the data.
  • The three most common data collection methods are
    group administration, mail administration, and
    in-person administration.
  • The group administered method of data collection
    is relatively accurate, low in cost, and
    traditionally accepted, particularly in education
    (classroom) research.

15
Group Administration
  • Very commonly, data is collected by the
    researcher, classroom teacher, or other
    professional within a group setting.
  • Regardless of who collects the data (e.g.,
    researcher, classroom teacher, field worker)
    caution must be exercised to ensure that biases
    do not result.
  • To protect the quality of the data collected the
    instructions and time lines regarding the project
    must be implemented exactly as desired.
  • Alterations in these areas can create or promote
    serious invalidity into the research design.

16
Mail-Administration
  • Mail questionnaires have advantages of subject
    privacy and convenience.
  • They are also relatively low in cost to
    implement.
  • Mail surveys have notoriously low response rates
    of approximately 25 resulting in time consuming
    follow-up of non-respondents.
  • If only interested persons return the survey,
    serious biasing could result.
  • Mailing lists used for sampling may be inherently
    biased depending upon the source of the list.

17
Mail-Administration
  • Concerning the instrument itself, instructions
    must be absolutely clear on mailed questionnaires
    since the respondent completes the form
    privately.
  • Question wording must be unambiguous and response
    scales, if used, must facilitate an ease of
    answering.
  • The key to success in using mail questionnaires
    involves meticulous planning of the instructions
    , questions, response types, form layout, and
    follow-up activities.

18
In-Person Administration
  • Personal interviewing or individual test
    administration is often used in behavioral
    research.
  • The researcher or other qualified field worker
    administers the test or interviews each subject
    to gain information regarding test performance or
    in-depth reactions and/or attitudes specific to a
    research topic.
  • Examiner bias must be carefully controlled.
  • Audio- or video-taping is frequently used as
    scoring and response reliability issues must
    often be addressed.

19
Instrumentation and Testing
  • Instruments are used to collect data.
  • An instrument may be a published or original
    test or survey form, a unobtrusive measuring
    device, or other type of measure tool.
  • All instruments must be selected for use based on
    their data collection validity, reliability, and
    practicality factors.

20
Validity
  • All research instruments must first be considered
    in terms of their validity.
  • Validity simply refers to the question does the
    instrument measure what it is supposed to
    measure?
  • Test or survey items must be germane to the
    subject area under investigation.
  • Three formal methods for evaluating instrument
    validity are content validity, criterion-related
    validity, and construct validity.

21
Content Validity
  • Content validity is the extent to which a
    measurement reflects the specific intended domain
    of content (Carmines Zeller, 1991).
  • It consists of logical thought and judgment as
    the method to derive valid test or survey items.
  • There is no quantitative evidence to objectively
    and scientifically demonstrates the instruments
    validityonly the researcher's opinion.

22
Criterion-related Validity
  • Criterion-related validity is used to demonstrate
    the accuracy of a measuring procedure by
    comparing it with another procedure which has
    been demonstrated to be valid
  • Also referred to as instrumental validity, it is
    a method by which correlation coefficients are
    established for the instrument.
  • There are two common types of criterion-related
    validity concurrent validity and predictive
    validity.

23
Criterion-related Validity
  • Both of these validity types utilize statistical
    correlation to arrive at a value or validity
    coefficient for the instrument.
  • The correlation may range from 0.0 to 1.00.
  • The closer to 1.00 the correlation coefficient
    is, the stronger the criterion-related validity.
  • Concurrent validity is the extent to which a test
    yields the same results as other measures of the
    same phenomenon.
  • A researcher may initially use content validity
    to develop test items.

24
Criterion-related Validity
  • The validity of a researcher-made instrument can
    be assessed concurrently by a correlation with a
    standardized published instrumentthe criterion.
  • Subjects are tested twice, using both the
    researcher-made test and a standardized test
    instrument.
  • If the researcher-made test is in fact a valid
    test, scores from subjects on this test should be
    closely related, i.e., statistically correlated,
    to test scores derived from the standardized
    published instrument.

25
Criterion-related Validity
  • Predictive validity is the extent to which a
    measure accurately forecasts how a person will
    think, act, and feel in the future.
  • Consider the situation in which the same
    researcher collects data over a series of months
    and notices that subjects that do well on the
    researcher-made test tend to also do well on
    another test.
  • The closer the predictive correlational
    coefficient value is to 1.00, the higher, and
    better, the predictive validity of the
    instrument.

26
Criterion-related Validity
  • Reasonable validity coefficient ranges for
    criterion-related validity measures are as
    follows
  • 1.00 to .90 (excellent)
  • .89 to .85 (good)
  • .80 to .84 (fair)
  • .79 or less (poor).
  • The quality of all criterion-related validity
    measures relies heavily upon the quality of the
    criterion measure.
  • In other words, an high concurrent validity
    between a researcher-made test and published
    standardized test, could be indicating that both
    tests are equally poor as opposed to equally
    good!

27
Construct Validity
  • The highest form of validity is construct
    validity.
  • It seeks an agreement between a theoretical
    concept and a specific measuring device, such as
    observation.
  • Construct validity utilizes multivariate factor
    analysis to develop factorsconstructswithin
    each test or survey instrument.

28
Construct Validity
  • For example, in a instrument measuring
    self-esteem, 150 original survey questions may be
    factor-analyzed into five specific factors of
    self-esteem general, personal, social, academic,
    professional.
  • In other words, all the instrument items could be
    categorized under five constructs.
  • Construct validity is a powerful and
    sophisticated approach to instrument validity.

29
Construct Validity
  • The factor analytic procedure generally allows
    the researcher to view new insights into the
    quality of the test items and the interrelations
    between questions.
  • The approach is best used when numerous questions
    are involved (all having the same response scale)
    and many subjects are used in the research.

30
Reliability
  • Reliability evaluates the consistency of the
    measurements.
  • Reliability measurements are presented as
    correlational coefficients.
  • The higher the correlation value, the more
    reliable the instrument.
  • As with validity, correlations may range from 0.0
    to 1.00.
  • A correlation of 1.00 represents perfect
    reliability within an instrument.

31
Reliability
  • A correlation of .20 would reflect a quite
    unreliable instrument.
  • Three techniques are used to assess reliability
    test-retest, split-half, and equivalent forms.
  • Test-retest reliability is established by
    administering a test or a survey twice to the
    exact same group of subjects with a short time
    lapse between testing.
  • The correlation can be done either by item, or,
    more commonly, by total test score.

32
Reliability
  • A correlational coefficient is calculated to
    measure the amount of relationship between
    subjects first and second test answer or test
    score.
  • Theoretically, the subjects should receive the
    identical score both times, if the test is
    consistent (i.e., reliable).
  • Practice effects may create spurious results, so
    test-retest is only recommended for use when
    other reliability methods are not feasible.

33
Reliability
  • Split-half reliability is an improved variation
    on test-retest reliability.
  • Test items are put in order of difficulty (if a
    cognitive test) or by subject matter (if
    attitudinal test) and then the test items are
    split version A with odd questions version B
    with even questions.
  • The theory is that if the total test is reliable,
    subjects should have highly correlated scores
    between the two versions, even and odd.

34
Reliability
  • Split-half reliability is a reasonable method of
    evaluating reliability depending upon how equally
    the total test can be divided.
  • It is not an appropriate method for timed tests.
  • Another reliability method involves developing
    equivalent forms two completely separate but
    equal tests are created.
  • The subject group is tested twice, once with each
    form of the test (e.g., the PPVT Form L, Form M).
  • A correlation coefficient calculated from both
    test scores on all subject will indicate the
    reliability of the tests.

35
Reliability
  • The success of this method depends greatly on the
    true equivalency of the two test versions.
  • Writing test items to match so closely can be
    much more difficult than it sounds.
  • Also, equivalent forms reliability requires two
    separate administrations of the instrument which
    all takes time and money.

36
Item Analysis
  • Item analysis is a powerful evaluative tool that
    can be applied to either cognitive or attitudinal
    instruments for recognizing instrument
    weaknesses, for test scoring, and for calculating
    internal consistency reliability measures.
  • It is done to determine each test items ability
    to discriminate high scoring and low scoring
    subjects.
  • This analysis involves a correlational
    calculation between the total test score and the
    item score.

37
Item Analysis
  • Computing correlational coefficients for each
    test item allows the researcher to evaluate each
    items effectiveness and consistency in relation
    to the total test.
  • A high scoring respondent should answer test
    questions consistently in a high scoring
    direction.
  • So if the correlation between scoring high on the
    total test and scoring high on one particular
    question is strong, the question must be a good
    one.

38
Instrument Selection
  • When collecting data, you need to consider
    whether to select a published instrument or
    develop your own.
  • You might select a published instrument because
    of its professional acceptance and because
    validity, reliability, and perhaps even item
    analysis data have already be established and
    acknowledged in the test manual.
  • The instrument has probably also been piloted and
    revised throughout the years.

39
Instrument Selection
  • To review published instruments, go directly to a
    test in print source like Buros Institute of
    Mental Measurements.
  • Read the test review to see if test that is one
    that might be appropriate for your study.
  • Also consider reviewing how a published
    instrument is perceived by professional journals.

40
Instrument Selection
  • The time spent investigating the instrument must
    be measured against the time period involved in
    developing an original test, which is generally
    the only remaining alternative.
  • Many times numerous published instruments fit
    closely to the topic being studied.
  • In such cases, each instrument must be evaluated
    individual across a consistent array of
    parameters defining the ideal instrument.

41
Instrument Selection
  • Only consider published instruments that have
    validity and reliability measures available.
  • If such data is unavailable, be suspicious.
  • If a published instrument can be located for your
    topic of interest, it is worth the effort to
    consider it strongly, but not blindly.
  • Although rare, a research project may be so
    unusual, creative, or innovative, that few, if
    any published instruments are appropriate for
    use.
  • In this case, the researcher must devise an
    original instrument.

42
Instrument Selection
  • The following steps may help you develop a
    scientific, original instrument
  • Review the most similar existing instruments.
  • Available instruments may not even measure the
    specific topic of your study, but they may yield
    some new ideas regarding question form, response
    scales, test length, calibration, etc.
  • In writing original items, start by first listing
    one to five major (autonomous) concepts that are
    to be investigated by the instrument.

43
Instrument Selection
  • Weigh each major concept in importance assign
    numerical values to these categories if possible.
  • Decide on the total number of questions desired.
  • If in doubt, use a conservative estimate, usually
    20 to 50 items.
  • Remember to consider the respondents interest
    level, attention span, and fatigue factors when
    deciding on a questionnaires or tests length.

44
Instrument Selection
  • Estimate using the weights assigned in step c)
    how many questions need to be developed for each
    major concept within the instrument. The more
    important concepts require more questions.
  • Develop response scales for each item. Try to
    stay consistent in types of response scales
    utilized.
  • If using a Likert-type (e.g., five-point response
    scale), use it consistently through out the
    instrument or test section.

45
Instrument Selection
  • Categorize all test items and read them to a
    peer.
  • Make corrections to eliminate ambiguous working
    combine similar items which ask the same or
    similar questions.
  • Do not overestimate or under estimate the
    respondents reading aptitude.
  • Over estimation of respondent reading skills can
    cause item misunderstanding, misinterpretation,
    and lowered validity and reliability.

46
Instrument Selection
  • Underestimation of respondent reading skills can
    cause levity, insult, or even resentment reading
    the instrument and even the entire test.
  • Refine the number of items down to the original
    estimate of 20-50.
  • Consider revising the wording of a few (10-20)
    final items to reduce the halo effect.
  • At this point, your instrument should possess at
    least defendable content validity.
  • Informally test the items for clarity with a very
    small group similar to the project respondent
    group.

47
Instrument Selection
  • Formally pilot test the instrument.
  • Typically, in cognitive studies, open-ended,
    write-in answers, true/false questions, or
    multiple-choice formats are utilized.
  • If questions are attitudinal, the Likert-type
    scales are commonly used.
  • Likert-type scales generally have five to seven
    response choices in degrees of progressive
    feelings (e.g., 1-strong agree 2-agree
    3-neural 4-disagree etc.).

48
Hazards in Testing
  • After item writing and response scale selection,
    there are some specialized hazards to consider
    with your research design.
  • The good subject syndrome and the
    self-fulfilling prophecy are hazards
    encountered with attitudinal surveys.
  • The good subject is the respondent who is
    genuinely attempting to help the researcher by
    answering an attitudinal question as it should
    be for the researchs sake, but not has he/she
    really feels.

49
Hazards in Testing
  • Watch for responses which are aimed at satisfying
    the research or project goals instead of
    providing accurate and sincere evaluative data.
  • The self-fulfilling prophecy occurs when the
    respondent answers questions the way he/she would
    like to see him/herself, instead of how he/she
    really sees him/herself.
  • This can be hard or impossible to detect for
    certain, but be aware of its possibility.

50
Hazards in Testing
  • In studies involving psychological motivations or
    controversial subjects (e.g., sex, religion,
    politics), the self-fulfilling prophecy can
    emerge easily and weaken the data tremendously.
  • The halo effect can be a common problem in
    studies which involve long checklists of
    evaluative questions.
  • The respondent may get into the habit of
    evaluating all items as agree regardless of
    his/her attitude toward the question.

51
Hazards in Testing
  • Do not design an instrument in which the
    respondent will need to assess numerous attitudes
    over a vary large number of question using an
    identical response scale.
  • The problem can usually be remedied by reversing
    the wording on various items at strategic
    locations in the instrument.
  • Also, use subparts within the instrument or allow
    short rest periods to help break up the test
    administration.

52
Hazards in Testing
  • The Hawthorne effect involves how subjects
    react in a study if they know they are being
    watched.
  • In an experiment years ago, workers demonstrated
    different skills and abilities simply by virtue
    of the fact they were being studied.
  • The experiment was conducted at the Hawthorne
    plant of Western Electric Company where the
    effect was first recognized.
  • Particularly in behavioral research, respondents
    may later their normal pattern or responses due
    merely to their knowledge of themselves being
    studied, and not to the experimental treatment.

53
Statistical Analysis
  • Statistical analysis provides an objective tool
    for researchers to use in measuring their
    findings and comparing them to their previous
    expectations.
  • The first step to locating the right statistic
    for your research hypothesis and design is to
    consider the nature of the data you are
    eliciting/collecting.

54
Data
  • Continuous data are comprised of ongoing, varying
    values.
  • Among the many examples are number of years at a
    residence, age, distances, test scores, scaled
    scores, IQs, yearly income, height, and weight.
  • Continuous data permit assessments of mean,
    range, standard deviation, variance, as well as
    other statistical options.

55
Data
  • Categorical (discrete, discontinuous) data are
    data which fall into groupings or divisions.
  • Common examples of categorical data include
    gender, political affiliation, blood type,
    favorite color, etc.
  • Continuous data can be transformed later into
    categorical data, but categorical data can never
    be later made continuous.

56
Measurement Scales
  • Data falls into one of four measurement scales
  • nominal
  • ordinal
  • interval or
  • ratio.
  • Remember the acronym NOIR for the correct order
    from the lowest and weakest measurement scale
    (nominal) to the highest and strongest
    measurement scale (ratio).

57
Measurement Scales
  • Nominal and ordinal measurements are common in
    social and behavioral sciences.
  • Data measured by either nominal or ordinal scales
    must be analyzed by nonparametric methods.
  • Data measured on the interval or ratio scales may
    be analyzed by parametric methods if the
    statistical model is valid for the data.

58
Nominal Data
  • A nominal variable is simply a named category.
  • For example, the psychiatric system of diagnostic
    groups constitutes a nominal scale.
  • When a diagnostician identifies a person as
    schizophrenic or paranoid or neurotic, s/he
    is using a categorical label to represent the
    class of people to which the person belongs.
  • Measurement at its weakest level exists when
    naming is used simply to classify an object,
    person, or characteristic.
  • In a nominal scale, the scaling operations
    partition a given class into a set of mutually
    exclusive subclasses.

59
Nominal Data
  • Whenever a sample of data is collected in such a
    way that each observation is assigned to a
    category (i.e., number of no responses versus
    number of yes responses), frequency counts are
    involved.
  • Nominal data has no intrinsic measure of quantity
    attached to it.
  • We could calculate the percentage of nos in the
    sample and the percentage of yeses.
  • We could report which category had the largest
    frequency, but we could not add the no and the
    yes categories to form a third category, since
    the responses would no longer fall into a unique
    subclass.

60
Ordinal Data
  • Ordinal data, as its name implies, sets
    categories into some rank order, from highest to
    lowest.
  • It may happen that the objects in one category of
    a scale are not only different from the objects
    in other categories, but also stand in some kind
    of relation to them.
  • Ordinal measurements communicate the relative
    standings of categories, but not the amount of
    the differences among them.
  • For example, letter grades are usually assigned
    A, B, C, D, and F.

61
Ordinal Data
  • These constitute an ordering of performance A is
    better than B which is better than C which is
    better than D which is better than F.
  • Any numbers may be assigned to these letter
    grades A4 B3 C2 D1, and F0, as long as
    the preserve the intended order, or as long as we
    assign a higher number to the member of the class
    which is greatest or more preferred.

62
Interval Data
  • When a scale has all the characteristics of an
    ordinal scale, and when the distances or
    differences between any two numbers on the scale
    have meaning, then measurement is considerably
    stronger.
  • An interval scale is characterized by a common
    and constant unit of measurement which assigns a
    number to all pairs of objects in the ordered
    set.
  • For an interval scale, the zero point and the
    unit of measurement are arbitrary.

63
Interval Data
  • Temperature is measured on an interval scale.
  • If youre Canadian, you use the Celsius scale,
    but if youre American you use the Fahrenheit
    scale.
  • The unit of measurement and the zero point in
    measuring temperature are arbitrary they are
    different for the two scales.
  • For instance, freezing occurs a 0 degrees on
    the Celsius scale but at 32 degrees on the
    Fahrenheit scale, while boiling occurs at 100
    degrees Celsius and at 212 degrees Fahrenheit.

64
Interval Data
  • However, both scales contain the same amount and
    the same kind of information because they are
    linearly related.
  • That is, a reading on one scale can be
    transformed to the equivalent reading on the
    other scale by means of a linear transformation.
  • The operations and relations which give rise to
    the structure of an interval scale are such that
    numbers associated with the positions of the
    objects on the interval scale can be manipulated
    arithmetically.

65
Interval Data
  • Thus, the interval scale is the first truly
    quantitative scale we have encountered.
  • Means, standard deviations, correlations, etc.
    are applicable to interval scale data.

66
Ratio Data
  • Ratio data, the highest measurement scale, also
    sets a true quantity value on numbers, but now
    with regard to a true zero-point.
  • Common examples of ratio data include age,
    weight, height, and most test scores.
  • On a classroom test of 10 questions, a student
    getting 7 correct answers receives a score of 7.
  • This score is of the ratio measurement type,
    since a true zero-point of zero correct does in
    fact exist and is the fundamental base from which
    the score of 7 is derived.

67
Descriptive vs. Inferential Statistics
  • Descriptive statistics summarize data.
  • Data are described using standard methods to
    determine the average value, the range of data
    around the average, and other characteristics.
  • Examples of descriptive statistics include the
    mean, mode, median, standard deviation, variance,
    and response percentages.
  • Often times graphs and charts are presented with
    regard to descriptive data to assist in
    explaining the statistics.

68
Descriptive vs. Inferential Statistics
  • Suppose you have the scores on a standardized
    test for 500 subjects.
  • Instead of presenting a list of the 500 scores in
    a research report, you might present an average
    score, which describes the performance of the
    typical subject.
  • A set of data does not always consist of scores.

69
Descriptive vs. Inferential Statistics
  • For instance, you might have data on the
    political affiliations of the residents of a
    community.
  • To summarize these data, you might count how many
    are Democrats, Republicans, Independents, etc.,
    and then calculate the percentages of each.
  • A percentage is a descriptive statistic that
    indicates how many units per 100 have a certain
    characteristic.
  • Thus, if 42 of a group of people are Democrats,
    42 out of each 100 people in the group are
    Democrats.

70
Descriptive vs. Inferential Statistics
  • The summaries provided by descriptive statistics
    are usually much more concise than the original
    data set (e.g., an average is much more concise
    than a list of 500 scores).
  • In addition, descriptive statistics help us
    interpret sets of data (e.g., an average helps us
    understand what is typical of a group).
  • The objective of descriptive statistics is simply
    to communicate the results without attempting to
    generalize beyond the sample of individuals to
    any other group.

71
Descriptive vs. Inferential Statistics
  • Descriptive statistics assume that data have an
    underlying normal distribution.
  • However, the properties of nominal and ordinal
    data to not correspond with the arithmetic
    system.
  • Moreover, descriptive tools such as averages and
    percentages for population data should be called
    parameters not statistics.

72
Descriptive vs. Inferential Statistics
73
Descriptive vs. Inferential Statistics
74
Descriptive vs. Inferential Statistics
75
Descriptive vs. Inferential Statistics
  • A standard normal curve has the mean, median,
    mode equal to each other.
  • The range measures the entire width of the
    distribution.
  • The kurtosis measures the flatness or peakedness
    of the curve.
  • The skewness addresses the amount of curve
    imbalance between right and left halves of the
    distribution.

76
Descriptive vs. Inferential Statistics
  • Inferential statistics are tools that tell us how
    much confidence we can have when we generalize
    from a sample to a population.
  • The goal of inferential statistics is to
    determine the likelihood that these differences
    could have occurred by chance as a result of the
    combined effects of unforeseen variables not
    under direct control of the experimenter.
  • An inferential test of a null hypothesis yields,
    as its final result, and probability that the
    null hypothesis is true.

77
Descriptive vs. Inferential Statistics
  • The symbol for probability is a lower-case p.
  • Thus, if we find that the probability that the
    null hypothesis is true in a given study is less
    than 5 in 100, this result would be expressed as
    p lt .05.
  • In other words, the null hypothesis would only be
    true five percent of the time, so it is probably
    not true.
  • There is always some probability that the null
    hypothesis is true, so researchers have settled
    on the .05 level as the level at which it is
    appropriate to reject the null hypothesis.

78
Descriptive vs. Inferential Statistics
  • When an alpha of .05 is used, we are, in effect,
    willing to be wrong 5 times in 100 in rejecting
    the null hypothesis.
  • We are taking a calculated risk that we might be
    wrong 5 of the time.
  • This type of error is known as a Type I Errorthe
    error of rejecting the null hypothesis when it is
    correct.
  • When the probability is low that the null
    hypothesis is correct, we reject the null
    hypothesis by declaring the result to be
    statistically significant.

79
Descriptive vs. Inferential Statistics
  • You will also frequently see p values of less
    than .05 reported.
  • The most common are p lt .01 (less than 1 in 100)
    and p lt .001 (less than 1 in 1000).
  • When a result is statistically significant at
    these levels, investigators can be more confident
    of not committing a Type I error.

80
Descriptive vs. Inferential Statistics
  • To review
  • .06 level not significant do not reject the
    H0.
  • .05 level significant reject the H0.
  • .01 level more significant reject the H0 with
    more confidence than at the .05 level.
  • .001 level highly significant reject the H0
    with even more confidence than at the .01 or .05
    levels.

81
Descriptive vs. Inferential Statistics
  • Should you decide to use some level other than
    .05, you should decide that in advance of
    examining the data.
  • When you require a lower probability before
    rejecting the null hypothesis (e.g., .01 instead
    of .05), you are increasing the likelihood that
    you will make a Type II Error.
  • A type II error is the error of failing to reject
    the null hypothesis when in it is false.
  • This type of error can have serious consequences.

82
Descriptive vs. Inferential Statistics
  • Although either decision we make about the null
    hypothesis (reject or fail to reject) may be
    wrong, by using inferential statistics and
    reporting the probability level, we are
    informing others of the likelihood that we
    incorrect when we decided to reject the null
    hypothesis.

83
Parametric Statistics
  • The nature of the data, continuous versus
    categorical, is an important consideration in
    deciding whether to use parametric or
    nonparametric statistical tests.
  • A parametric statistical test specifies certain
    conditions about the distribution of responses in
    the population from which the research sample was
    drawn.
  • Specifically, the data must satisfy the following
    assumptions

84
Parametric Statistics
  • The assumption of normalitythat the samples upon
    which the research is done was selected from
    populations which are normally distributed
  • The assumption of homogeneity of variancethat
    the spread (variance or standard deviation) of
    the dependent variable (e.g., score) within the
    group tested must be statistically equal. That
    is, the shape of each groups distributional
    curve should be equal and
  • The assumption that the nature of the data is
    continuous.

85
Parametric Statistics
  • If the data collected satisfies all three
    assumptions, parametric procedures are
    recommended.
  • If any of these three assumptions are violated by
    the data, then non-parametric statistical tests
    should be used.

86
t-Test
  • T-test are used to compare the means of two
    samples for statistical significance.
  • The t-test can be used to test two groups on a
    pre-test only two groups on a post-test only
    one group on a pre-test versus post-test or two
    groups on gain scores (e.g., post-test minus
    pre-test).
  • The t-test is useful when the sample has the
    following characteristics
  • If the sample is large, the less likely the
    difference between two means was created by
    sampling errors

87
t-Test
  • If the difference between the two means is large,
    the less likely that the difference was created
    by sampling errors and
  • If the variance among the subjects is small, the
    less likely that the difference between two means
    was created by sampling errors.
  • There are two types of t tests.
  • One is for independent (uncorrelated) data and
    the other is for dependent (correlated) data.
  • Independent data are obtained when there is no
    matching or pairing of subjects across groups
    dependent data are obtained when each score in
    one set of scores is paired with a score in
    another set.

88
ANOVA
  • Closely related to the t test is analysis of
    variance (ANOVA).
  • The ANOVA is the most traditionally and widely
    accepted form of statistical analysis.
  • ANOVA is used to test the difference(s) among two
    or more means utilizing a single statistical
    operation.
  • ANOVA accomplishes its statistical testing by
    comparing variances between the groups to the
    variance within the group.
  • A resulting F-ratio (variance between groups
    divided by variance within group) and an
    associated significance level is found.

89
ANOVA
  • One-way, or single-factor ANOVA, is used when
    subjects are classified only according to one
    categorical group (e.g., drug group method of
    instruction).
  • Two-way, or two-factor ANOVA, is used when each
    subject is classified in two ways such as 1) drug
    group and 2) gender.
  • A two-way ANOVA examines two main effects (drug
    level and gender) and one interaction (drug level
    x gender).
  • This is done by computing three values of F (one
    for each of the three null hypotheses) and
    determining the probability associated with each.

90
Pearson r
  • The Pearson product-moment linear correlation
    coefficient r is a very popular parametric
    statistical measure of the relationship between
    two continuous data variables.
  • Pearson r is used when the researcher wishes to
    study how a change in one variable may tend to be
    related to a change in a second variable.
  • Since Pearson r is a measure of relationship,
    data on two variables are collected from the same
    group of subjects and paired.

91
Pearson r
  • A resulting r value and an associated
    significance level would assess both the
    direction ( or direct or - inverse) and the
    strength (between o and 1.00) of the relationship
    between the two variables.

92
Non-Parametric Statistics
  • A non-parametric statistical test is based on a
    model that specifies only very general conditions
    and none regarding the specific from of the
    distribution from which the sample was drawn.
  • Non-parametric tests need none of the three
    parametric assumptions satisfied for their proper
    application.
  • Non-parametric tests can be applied in almost any
    research situation.
  • Usually, the non-parametric methods serve as the
    statistical tests for nominal or ordinal scaled
    data.

93
Chi-Square
  • The most popular of all non-parametric
    inferential statistical methods is the chi-square
    (?2).
  • ?2 tests for differences between categorical
    variables (nominal or ordinal).
  • Such data do not permit the computation of means
    and standard deviations
  • Instead, we normally report the number of
    subjects who were found in each category (the
    frequency) and the corresponding proportions or
    percentages.

94
Chi-Square
  • There are both one-way and two-way chi-square
    procedures.
  • A one-way ?2 (also known as a goodness of fit
    chi-square) is used if one categorical variable
    is involved, say political affiliation.
  • The one-way chi-square would test for differences
    in popularity between the political party
    candidates
  • Candidate Smith n 110
  • Candidate Doe n 90

95
Chi-Square
  • The two-way chi-square is used when two
    categorical variables are to be compared
    (political candidate and gender)
  • Candidate Jones Candidate Black
  • Males n80 n120
  • Females n120 n80
  • There are two types of chi-square tests.
  • A chi-square test of homogeneity involves two or
    more populations, as above, on one outcome
    variable.

96
Chi-Square
  • A chi-square test of independence involves one
    population, classified in two ways.
  • For example, a random sample of college student
    was asked whether they think that IQ tests
    measure innate intelligence and whether they had
    taken a tests and measurements course.
  • The data would render two categories of
    information (innate opinion and course taking).

97
Wilcoxon Matched Pairs Sign Test
  • The Wilcoxon Matched Pairs Signed-Ranks test is a
    commonly used non-parametric analog of the the
    paired t test that utilizes information about
    both the magnitude and direction of difference
    for pairs of scores.
  • In the behavioral sciences, it is the commonly
    used non-parametric test of the significance of
    difference between dependent samples.
  • This test is appropriate for studies involving
    repeated measures, as in the pre-test and
    post-test designs in which the same subjects
    serve as their own controls or in cases which use
    matched pairs.

98
Wilcoxon Matched Pairs Sign Test
  • Suppose we wish to determine whether preschool
    children with impairments in both grammar and
    phonology will make more speech sound errors when
    imitating grammatically complete sentences than
    when imitating relatively simples sentences that
    are comparable in length.
  • We are using one set of children and looking at
    the correlation between grammar complexity and
    speech sound errors.

99
Wilcoxon Matched Pairs Sign Test
  • When testing dependent or correlated samples, the
    Wilcoxon matched pairs sign test will determine
    which member of a pair of scores is larger or
    smaller than the other (as denoted by or -,
    respectively), and the ranking of such size
    differences.
  • Paired scores are organized into a table, and the
    difference if found ( if first of pair is
    larger - if first in pair is smaller).
  • The sign of a number has no real mathematical
    significance, it just serves to mark the
    direction of the difference between the pairs of
    scores.

100
Wilcoxon Matched Pairs Sign Test
  • Then the differences are ranked according to
    their relative magnitude, assigning an average
    rank score to each tie irrespective of whether
    the sign is positive or negative.
  • Zero difference scores between pairs (d0) are
    dropped from the analysis.
  • Therefore, the total number of signed ranks (n)
    is used in determining the criterion for
    rejecting the null hypothesis.
  • Finally, the absolute value of the ranked
    difference score having the least frequent sign
    are summed (T).

101
Mann-Whitney U Test
  • The Mann-Whitney U test looks at whether the
    distribution of scores for one random sample is
    significantly different from the distribution of
    another independent random sample.
  • It is concerned with the equality of medians
    rather than means.
  • It is commonly used when the parametric t tests
    assumptions of normality and homogeneity of
    variance are violated.

102
Mann-Whitney U Test
  • Suppose we are interested in knowing whether the
    physical status of newborns is related to their
    subsequent development of receptive language.
  • For this purpose, we conduct a prospective study
    in which Apgar scores are collected on a random
    sample.
  • Such scores are used to denote the general
    condition of the infant shortly after birth based
    on five physical indices including skin color,
    heart rate, respiratory effort, muscle tone, and
    reflex irritability.

103
Mann-Whitney U Test
  • The maximum score of 10 is indicative of
    excellent physical condition.
  • Using these numerical values as our independent
    variables, we divide our sample into two groups
    10 children with high Apgar scores (greater than
    6) and 8 children with low Apgar scores (less
    than 4).
  • Composite language scores, obtained from these
    same children at ages 3 to 3.5 years on the
    appropriate subtests of the CELF-P serve as the
    dependent variable.

104
Mann-Whitney U Test
  • Our research hypothesis is that there is a
    difference in the receptive language ability of
    children who scored high on the Apgar scale
    versus those who scored low.
  • Data is organized by category (language score
    high Apgar group and language score low Apgar
    group) and then the rank of those language scores
    is completed, just like with the Wilcoxon matched
    pairs sign test.

105
Mann-Whitney U Test
  • This time, though, the ranks are summed for each
    category.
  • A calculation is performed and the smaller of U1
    and U2 serves as the observed value which is
    compared to a tabular critical value for
    rejecting or maintaining the null hypothesis.
Write a Comment
User Comments (0)
About PowerShow.com