Data in empirical research Some fundamental issues - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Data in empirical research Some fundamental issues

Description:

Data in empirical research Some fundamental issues Daniel Gile daniel.gile_at_yahoo.com www.cirinandgile.com * D Gile Data in empir res D. Gile data in empir ... – PowerPoint PPT presentation

Number of Views:129

Avg rating:3.0/5.0

Slides: 36

Provided by: Utili248

Category:

more less

Transcript and Presenter's Notes

Title: Data in empirical research Some fundamental issues

1
Data in empirical research Some fundamental
issues

Daniel Gile
daniel.gile_at_yahoo.com
www.cirinandgile.com

2
Reminder Data, the foundation of progress in CSA
(1)

In HSA, scholars can observe reality, and then
speculate and theorize with much freedom
The norms of caution and rigorous inferencing
make this impossible in CSA
In CSA theoretical speculation is acceptable
As a starting point for further empirical
exploration
As a basis for theory construction, but the
theory
will need to be tested empirically
As tentative ideas to explain findings
But unlike the situation in HSA, in CSA,
all progress is by definition based on data and
their analysis

3
Reminder Data, the foundation of progress in CSA
(2)

So the quality of research is limited by
the quantity and quality of the data on which it
is based
In many cases, it is difficult to
Collect valid, relevant data
Measure the data in a way that will help advance
towards finding an answer to the research
question(s)
- Extrapolate from the data that can be
collected on part of the environment or
population to which the research question(s)
apply to the whole population
If the data are not valid or representative of
the population, no reliable inferences on the
population can be made
If cannot measure them adequately, they are of
limited use

4
Collecting data Access and indicators

Access to the data is often problematic
Cost, confidentiality, difficult to detect
Cost and complexity of technical equipment
Physical access to the location
Permission to observe/record
But more fundamentally
How do you gain access to the content of dreams?
How do you gain access to mental processes?
How do you gain access to skills for observation?
You cannot observe them directly
What you generally observe (and measure) are
indicators
In other words, data are not the phenomenon
itself, but an indicator of the phenomenon more
later

5
Collecting data Identifying target data

When collecting data on a phenomenon or an
indicator
Inot always easy to identify the target data from
other information picked up
When studying language skills
and using errors and infelicities as an
indicator,
How identify error and infelicities in linguistic
data?
When studying translation tactics
(decisions made when confronting a problem)
How distinguish between the result of a tactic
and the result of insufficient skills?
(e.g. omissions, small semantic changes)

6
Problems with data validity (1)

Reminder Research explores various phenomena in
Reality
Generally, data are not the phenomena themselves,
but
something believed to correspond to them in
some way
For instance,
When studying voting behavior, the data used,
e.g. the number of ballots cast in favor of a
certain candidate, are not the voting behavior
itself. They are something that reflects voting
behavior.
One could say that generally, data are indicators
Though the term indicator tends to be used to
call something that is even more remotely
connected to the reality it is supposed to
represent
Data are said to be valid if they correspond
strongly to what they are supposed to correspond
to.

7
Problems with data validity (2)

Data are valid if one or some of their features
correspond strongly to what they are supposed to
correspond to in the object of study.
Such correspondence may be required for detection
only
i.e. if and only if a particular feature of the
object of study exists, the data take on a
particular feature and vice-versa
(the presence of particular objects on
archaeological sites is valid data to indicate
skills/beliefs/rituals in the population which
lived in these particular sites)
Quantitative correspondence may be required in
other cases
(e.g. measuring the amount of radioactivity, of a
particular chemical substance etc)

8
Data validity uncertain correspondence (1)

Voting statistics are a valid indicator of voting
behavior
What about voting intentions as stated in
interviews?
are they valid as an indicator of voting
behavior?
They say something about voting behavior, but
that something is not enough to determine how
people are going to vote
Because
Some people may change their mind
Some people do not speak the truth

Phenomenon
Data
9
Data validity uncertain correspondence (2)

One frequent problem with data validity is the
uncertain correspondence between the data and the
target phenomenon
e.g. Native speakers assessment of a non-native
speakers mastery of their language
(How sensitive are they to errors and
infelicities? What are their personal norms? What
are their expectations?)
Students assessment of their teachers
(Personal bias, political correctness)
Problems because of interference from affective
factors (often subconscious) desire to preserve
self-image
Ex. In Translation Studies, relative weight of
quality components
This problem is particularly frequent in
behavioral sciences

10
Data validity partial correspondence (1)

Are police reports about sexual assaults a valid
indicator of actual sexual assault activity in a
given city?
Most police reports about sexual assaults
probably report genuine sexual assaults, but
there are many which are never reported because
the victims are afraid to report them or ashamed
So the data are valid for one part of the
phenomenon only

Phenomenon
?
Data
11
Data validity partial correspondence (2)

When data are valid for one part of the
phenomenon only,
whereas exploration of the whole phenomenon is
sought
How safe is it to extrapolate from info on part
of the phenomenon only?
(This is distinct from the issue of
representativeness, taken up later)
Example
A single test to test language proficiency?
Language proficiency is multi-dimensional
(declarative knowledge, procedural knowledge,
distinct skills like pronunciation, fluency,
reading ability, listening comprehension ability,
flexibility in using various registers)

12
Validity of other research environment components

The validity of the data/the indicator chosen is
not the only validity issue in empirical research
As will be seen later, especially in experimental
research
Ecological validity can be an issue
Task
Environment
Participants

13
Measurable data

Often, advancing towards an answer to the
research question(s) requires some kind of
measurement of data
(intensity, magnitude, amount, frequency)
In some cases, this is rather easy
(thermometer, number of ballots cast, money/time
spent)
In other cases, it is difficult
(intensity of feelings, amount of deviation
from a norm)

14
Representative data (1)

Generally, it is not possible to have data on all
the object of study
(cost, time including future, physical access)
You can only access data on part of it
They may be valid and measurable,
but are they representative of the whole object
of study?
Or of part of it only?

Data
Phenomenon
15
Representative data (2)

If the phenomenon is very homogeneous
If the accessible part has the same relevant
features as the whole
The data are said to be representative
If not, you cannot legitimately make inferences
from your sample on the whole

Data
Phenomenon
16
Validity and Representativeness

They are not the same
Data can be valid, that is, provide reliable
indications
on part of a phenomenon/object of study
(for instance, on a sample of people from a
population)
Without being representative
Because it is possible that the characteristics
of the sample are different from the
characteristics of the population
(for instance, the average height of a
population, if the sample of people used has a
high proportion of basket-ball players)

17
Priorities and strategies

Validity is particularly important
Scientifically legitimate inferences on a
phenomenon
can only be made if the data are valid
Representativeness is less of a problem
Provided no generalization is asserted
Measurability can be important
If only to measure the actual impact of a
particular factor or feature on the object of
study
Sometimes, measurability can be constructed
(scales)
But limited measurability does not mean nothing
can be learned about the object of study ?
Qualitative research

18
The effects of variability

One other important issue in empirical research
is
variability
Variability can be intrinsic to the phenomenon
(for instance in meteorological phenomena)
It can also be a feature of the data collected
Due to intrinsic variability in the phenomenon
and/or
Heterogeneity in the phenomenon and/or
Variability in the collection procedures
Its effects can be very large

19
CASE STUDY (FICTION) THE EFFECT OF EXPERIENCE ON
TRANSLATION QUALITY

Suppose you want to investigate the effect of
experience on translation quality
Suppose that in reality, on average, there is a
fast progression along the learning curve during
the first 5 years, and over the next decades,
translators continue to improve, but at a lower
and lower speed

20
REAL AVERAGE PERFORMANCE VS. EXPERIENCEAs
measured by some valid indicator on a scale from
1 to 10
Exper. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs
Qual. 1 5 7 8 8.5 8.8
21
Real average learning curve
22
Effects of attitude

- The translators attitude towards translation
may influence the quality of their work.
- Attitudes may change over time
- Suppose that attitudes are very positive in
the beginning, that they become negative after a
while because translators are disappointed with
market conditions, and that they gradually become
more positive when they adapt to the situation.

23
Experience vs. Attitude

Very positive to very negative to positive

Exp. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs
Attit. - - - -
24
The effect of attitude two scenarios
Exp. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs
Large influ. 3 2 -3 -1 1 1
Small influ. 0.3 0.2 -0.3 -0.1 0.1 0.1
25
The effect of attitude
26
The effect of attitude

- In the small influence scenario, the output
pattern is only changed marginally
- In the large influence scenario, it is changed
considerably. In particular, real improvement
seems to occur only after 10 years of experience.

27
Controllability

- Experimenters may be able to control attitude,
for instance by telling participants that the
quality of their output is important, or that
they will be assessed by peers, etc.
- But it is not realistic to assume they can
control everything the participants
personality, fatigue, biorhythm, likes or
dislikes of certain types of texts, themes, etc.

28
The effect of uncontrolled variability

Assume a variability of up to 30, either
intrinsic or from uncontrolled factors

Exp. 0 yrs 5 yrs 10 yrs 15 yrs 20 yrs 25 yrs
Var. 30 -30 -30 30 -20 -30
29
The effect of uncontrolled variability
30
The effect of variability

With such variability, very common in empirical
studies in translation and interpreting
(actually, in such studies variability is
often of several hundred percent),
the underlying true pattern is severely
distorted
- In particular, from the data, it seems that
improvement occurs for 15 years, after which
there is a steady decline in the quality of the
translation output.

31
Consequences and conclusion (1)

Variability is a major enemy of research, in that
it is likely to hide true trends and suggest
false trends.
In experiments, some variability is
counter-balanced by the use of control over
relevant variables, both in sampling and in the
control of environmental and independent
variables
Variability is further reduced by strict design
and implementation of the experimental procedure
Replications also reduce the effects of
variability by providing data for different
constellation of parameters

32
Consequences and conclusion (2)

But in behavioral sciences, residual variability
is often very large
If you plan to do experimental research, expect
to find high variability, and do not be
disappointed if this happens.
Unless you need to arrive at a clear-cut
result, results that are not clear cut can also
be of interest
They may show for instance
that there is no regular, clear superiority of
one method or one condition over another
so dont let the probability of not reaching
significance stop you from doing the research.

33
The sensitivity of indicators/tools (1)

The concepts of signal and noise
(from radio transmission)
In empirical research, when seeking to collect
data, you need tools with a certain sensitivity
For instance, casual listeners will not
necessarily spot traces of foreign accent or
infelicities in a non-native speaker
Their sensitivity to these phenomena may be too
low
And they will miss the signal which is supposed
to be detected
Other listeners may be too sensitive and mistake
native deviations from norms for signs of
non-native language use
(certain violations of rules of grammar, false
cognates)

34
Sensitivity of data collection tools (2)

a
b
c
At level a Not sensitive enough. Does not
pick up the signal, or picks up
part of it only
At level b Appropriate sensitivity. Picks up
the signal, not the noise
At level c Too sensitive. Picks up the signal
and the noise

S e n s i t i v i t y
35
The sensitivity of indicators/tools (3)

Very high sensitivity which may pick up the
noise
(i.e. non-signal)
is all right if it is then possible to filter out
the noise from the signal
But often, this is not possible,
Because the noise is very similar to the signal
Other tactics may help
One is triangulation,
i.e. using a different method to throw a
different light on the phenomenon/data, including
qualitative methods