Data in empirical research Some fundamental issues - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Data in empirical research Some fundamental issues

Description:

Data in empirical research Some fundamental issues Daniel Gile daniel.gile_at_yahoo.com www.cirinandgile.com * D Gile Data in empir res D. Gile data in empir ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 36
Provided by: Utili248
Category:

less

Transcript and Presenter's Notes

Title: Data in empirical research Some fundamental issues


1
Data in empirical research Some fundamental
issues
  • Daniel Gile
  • daniel.gile_at_yahoo.com
  • www.cirinandgile.com

2
Reminder Data, the foundation of progress in CSA
(1)
  • In HSA, scholars can observe reality, and then
    speculate and theorize with much freedom
  • The norms of caution and rigorous inferencing
    make this impossible in CSA
  • In CSA theoretical speculation is acceptable
  • As a starting point for further empirical
  • exploration
  • As a basis for theory construction, but the
    theory
  • will need to be tested empirically
  • As tentative ideas to explain findings
  • But unlike the situation in HSA, in CSA,
  • all progress is by definition based on data and
    their analysis

3
Reminder Data, the foundation of progress in CSA
(2)
  • So the quality of research is limited by
  • the quantity and quality of the data on which it
    is based
  • In many cases, it is difficult to
  • Collect valid, relevant data
  • Measure the data in a way that will help advance
    towards finding an answer to the research
    question(s)
  • - Extrapolate from the data that can be
    collected on part of the environment or
    population to which the research question(s)
    apply to the whole population
  • If the data are not valid or representative of
    the population, no reliable inferences on the
    population can be made
  • If cannot measure them adequately, they are of
    limited use

4
Collecting data Access and indicators
  • Access to the data is often problematic
  • Cost, confidentiality, difficult to detect
  • Cost and complexity of technical equipment
  • Physical access to the location
  • Permission to observe/record
  • But more fundamentally
  • How do you gain access to the content of dreams?
  • How do you gain access to mental processes?
  • How do you gain access to skills for observation?
  • You cannot observe them directly
  • What you generally observe (and measure) are
    indicators
  • In other words, data are not the phenomenon
    itself, but an indicator of the phenomenon more
    later

5
Collecting data Identifying target data
  • When collecting data on a phenomenon or an
    indicator
  • Inot always easy to identify the target data from
    other information picked up
  • When studying language skills
  • and using errors and infelicities as an
    indicator,
  • How identify error and infelicities in linguistic
    data?
  • When studying translation tactics
  • (decisions made when confronting a problem)
  • How distinguish between the result of a tactic
  • and the result of insufficient skills?
  • (e.g. omissions, small semantic changes)

6
Problems with data validity (1)
  • Reminder Research explores various phenomena in
    Reality
  • Generally, data are not the phenomena themselves,
    but
  • something believed to correspond to them in
    some way
  • For instance,
  • When studying voting behavior, the data used,
    e.g. the number of ballots cast in favor of a
    certain candidate, are not the voting behavior
    itself. They are something that reflects voting
    behavior.
  • One could say that generally, data are indicators
  • Though the term indicator tends to be used to
    call something that is even more remotely
    connected to the reality it is supposed to
    represent
  • Data are said to be valid if they correspond
    strongly to what they are supposed to correspond
    to.

7
Problems with data validity (2)
  • Data are valid if one or some of their features
    correspond strongly to what they are supposed to
    correspond to in the object of study.
  • Such correspondence may be required for detection
    only
  • i.e. if and only if a particular feature of the
    object of study exists, the data take on a
    particular feature and vice-versa
  • (the presence of particular objects on
    archaeological sites is valid data to indicate
    skills/beliefs/rituals in the population which
    lived in these particular sites)
  • Quantitative correspondence may be required in
    other cases
  • (e.g. measuring the amount of radioactivity, of a
    particular chemical substance etc)

8
Data validity uncertain correspondence (1)
  • Voting statistics are a valid indicator of voting
    behavior
  • What about voting intentions as stated in
    interviews?
  • are they valid as an indicator of voting
    behavior?
  • They say something about voting behavior, but
    that something is not enough to determine how
    people are going to vote
  • Because
  • Some people may change their mind
  • Some people do not speak the truth

Phenomenon
Data
9
Data validity uncertain correspondence (2)
  • One frequent problem with data validity is the
    uncertain correspondence between the data and the
    target phenomenon
  • e.g. Native speakers assessment of a non-native
    speakers mastery of their language
  • (How sensitive are they to errors and
    infelicities? What are their personal norms? What
    are their expectations?)
  • Students assessment of their teachers
  • (Personal bias, political correctness)
  • Problems because of interference from affective
    factors (often subconscious) desire to preserve
    self-image
  • Ex. In Translation Studies, relative weight of
    quality components
  • This problem is particularly frequent in
    behavioral sciences

10
Data validity partial correspondence (1)
  • Are police reports about sexual assaults a valid
    indicator of actual sexual assault activity in a
    given city?
  • Most police reports about sexual assaults
    probably report genuine sexual assaults, but
    there are many which are never reported because
    the victims are afraid to report them or ashamed
  • So the data are valid for one part of the
    phenomenon only

Phenomenon
?
Data
11
Data validity partial correspondence (2)
  • When data are valid for one part of the
    phenomenon only,
  • whereas exploration of the whole phenomenon is
    sought
  • How safe is it to extrapolate from info on part
    of the phenomenon only?
  • (This is distinct from the issue of
    representativeness, taken up later)
  • Example
  • A single test to test language proficiency?
  • Language proficiency is multi-dimensional
  • (declarative knowledge, procedural knowledge,
    distinct skills like pronunciation, fluency,
    reading ability, listening comprehension ability,
    flexibility in using various registers)

12
Validity of other research environment components
  • The validity of the data/the indicator chosen is
    not the only validity issue in empirical research
  • As will be seen later, especially in experimental
    research
  • Ecological validity can be an issue
  • Task
  • Environment
  • Participants

13
Measurable data
  • Often, advancing towards an answer to the
    research question(s) requires some kind of
    measurement of data
  • (intensity, magnitude, amount, frequency)
  • In some cases, this is rather easy
  • (thermometer, number of ballots cast, money/time
    spent)
  • In other cases, it is difficult
  • (intensity of feelings, amount of deviation
    from a norm)

14
Representative data (1)
  • Generally, it is not possible to have data on all
    the object of study
  • (cost, time including future, physical access)
  • You can only access data on part of it
  • They may be valid and measurable,
  • but are they representative of the whole object
    of study?
  • Or of part of it only?

Data
Phenomenon
15
Representative data (2)
  • If the phenomenon is very homogeneous
  • If the accessible part has the same relevant
    features as the whole
  • The data are said to be representative
  • If not, you cannot legitimately make inferences
    from your sample on the whole

Data
Phenomenon
16
Validity and Representativeness
  • They are not the same
  • Data can be valid, that is, provide reliable
    indications
  • on part of a phenomenon/object of study
  • (for instance, on a sample of people from a
    population)
  • Without being representative
  • Because it is possible that the characteristics
    of the sample are different from the
    characteristics of the population
  • (for instance, the average height of a
    population, if the sample of people used has a
    high proportion of basket-ball players)

17
Priorities and strategies
  • Validity is particularly important
  • Scientifically legitimate inferences on a
    phenomenon
  • can only be made if the data are valid
  • Representativeness is less of a problem
  • Provided no generalization is asserted
  • Measurability can be important
  • If only to measure the actual impact of a
    particular factor or feature on the object of
    study
  • Sometimes, measurability can be constructed
  • (scales)
  • But limited measurability does not mean nothing
    can be learned about the object of study ?
    Qualitative research

18
The effects of variability
  • One other important issue in empirical research
    is
  • variability
  • Variability can be intrinsic to the phenomenon
  • (for instance in meteorological phenomena)
  • It can also be a feature of the data collected
  • Due to intrinsic variability in the phenomenon
    and/or
  • Heterogeneity in the phenomenon and/or
  • Variability in the collection procedures
  • Its effects can be very large

19
CASE STUDY (FICTION) THE EFFECT OF EXPERIENCE ON
TRANSLATION QUALITY
  • Suppose you want to investigate the effect of
    experience on translation quality
  • Suppose that in reality, on average, there is a
    fast progression along the learning curve during
    the first 5 years, and over the next decades,
    translators continue to improve, but at a lower
    and lower speed

20
REAL AVERAGE PERFORMANCE VS. EXPERIENCEAs
measured by some valid indicator on a scale from
1 to 10
Exper. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs
Qual. 1 5 7 8 8.5 8.8
21
Real average learning curve
22
Effects of attitude
  • - The translators attitude towards translation
    may influence the quality of their work.
  • - Attitudes may change over time
  • - Suppose that attitudes are very positive in
    the beginning, that they become negative after a
    while because translators are disappointed with
    market conditions, and that they gradually become
    more positive when they adapt to the situation.

23
Experience vs. Attitude
  • Very positive to very negative to positive

Exp. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs
Attit. - - - -
24
The effect of attitude two scenarios
Exp. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs
Large influ. 3 2 -3 -1 1 1
Small influ. 0.3 0.2 -0.3 -0.1 0.1 0.1
25
The effect of attitude
26
The effect of attitude
  • - In the small influence scenario, the output
    pattern is only changed marginally
  • - In the large influence scenario, it is changed
    considerably. In particular, real improvement
    seems to occur only after 10 years of experience.

27
Controllability
  • - Experimenters may be able to control attitude,
    for instance by telling participants that the
    quality of their output is important, or that
    they will be assessed by peers, etc.
  • - But it is not realistic to assume they can
    control everything the participants
    personality, fatigue, biorhythm, likes or
    dislikes of certain types of texts, themes, etc.

28
The effect of uncontrolled variability
  • Assume a variability of up to 30, either
    intrinsic or from uncontrolled factors

Exp. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs
Var. 30 -30 -30 30 -20 -30
29
The effect of uncontrolled variability
30
The effect of variability
  • With such variability, very common in empirical
    studies in translation and interpreting
  • (actually, in such studies variability is
    often of several hundred percent),
  • the underlying true pattern is severely
    distorted
  • - In particular, from the data, it seems that
    improvement occurs for 15 years, after which
    there is a steady decline in the quality of the
    translation output.

31
Consequences and conclusion (1)
  • Variability is a major enemy of research, in that
    it is likely to hide true trends and suggest
    false trends.
  • In experiments, some variability is
    counter-balanced by the use of control over
    relevant variables, both in sampling and in the
    control of environmental and independent
    variables
  • Variability is further reduced by strict design
    and implementation of the experimental procedure
  • Replications also reduce the effects of
    variability by providing data for different
    constellation of parameters

32
Consequences and conclusion (2)
  • But in behavioral sciences, residual variability
    is often very large
  • If you plan to do experimental research, expect
    to find high variability, and do not be
    disappointed if this happens.
  • Unless you need to arrive at a clear-cut
    result, results that are not clear cut can also
    be of interest
  • They may show for instance
  • that there is no regular, clear superiority of
    one method or one condition over another
  • so dont let the probability of not reaching
    significance stop you from doing the research.

33
The sensitivity of indicators/tools (1)
  • The concepts of signal and noise
  • (from radio transmission)
  • In empirical research, when seeking to collect
    data, you need tools with a certain sensitivity
  • For instance, casual listeners will not
    necessarily spot traces of foreign accent or
    infelicities in a non-native speaker
  • Their sensitivity to these phenomena may be too
    low
  • And they will miss the signal which is supposed
    to be detected
  • Other listeners may be too sensitive and mistake
    native deviations from norms for signs of
    non-native language use
  • (certain violations of rules of grammar, false
    cognates)

34
Sensitivity of data collection tools (2)


  • a


  • b


  • c
  • At level a Not sensitive enough. Does not
    pick up the signal, or picks up
  • part of it only
  • At level b Appropriate sensitivity. Picks up
    the signal, not the noise
  • At level c Too sensitive. Picks up the signal
    and the noise

S e n s i t i v i t y
35
The sensitivity of indicators/tools (3)
  • Very high sensitivity which may pick up the
    noise
  • (i.e. non-signal)
  • is all right if it is then possible to filter out
    the noise from the signal
  • But often, this is not possible,
  • Because the noise is very similar to the signal
  • Other tactics may help
  • One is triangulation,
  • i.e. using a different method to throw a
    different light on the phenomenon/data, including
    qualitative methods
Write a Comment
User Comments (0)
About PowerShow.com