Review of Faking in Personnel Selection

About This Presentation

Title:

Review of Faking in Personnel Selection

Description:

Review of Faking in Personnel Selection Chris D. Fluckinger University of Akron cdf12_at_uakron.edu Deborah L. Whetzel Human Resources Research Organization – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 93

Provided by: peopleVc1

Learn more at: http://www.people.vcu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Review of Faking in Personnel Selection

1
Review of Faking in Personnel Selection
Chris D. Fluckinger University of
Akron cdf12_at_uakron.edu
Deborah L. Whetzel Human Resources Research
Organization dwhetzel_at_humrro.org

Michael A. McDaniel
Virginia Commonwealth University
mamcdani_at_vcu.edu

Prepared for International Workshop on Emerging
Frameworks and Issues for ST Recruitments Societ
y for Reliability Engineering, Quality and
Operations Management (SREQOM) Delhi, India
September, 2008
2

We note that Chris D. Fluckinger is the senior
author of our book chapter associated with this
conference. Although not present at the
conference, his contributions to this
presentation were substantial.

3
Goal of this Presentation

Provide practitioners and researchers with a
solid understanding of the practical issues
related to faking in test delivery and
assessment.

4
Overview

Typical vs. maximal performance
The usefulness of different strategies to
identify faking
How faking creates challenges to test delivery
and measurement
Review and critique of common strategies to
combat faking

5
Faking

Faking is a conscious effort to improve ones
score on a selection instrument.
Faking has been described using various terms
including
Response distortion
Social desirability
Impression management
Intentional distortion, and
Self enhancement
Hough, Eaton, Dunnette, Kamp, McCloy (1990)
Lautenschlager, (1994) Ones, Viswesvaran
Korbin (1995).

6
Maximal vs. Typical Performance

Faking can be understood by comparing the
distinction between maximal and typical
performance.
Cronbach, (1984)
This distinction is useful in understanding
faking.

7
Maximal Performance

Maximal performance tests assess how respondents
perform when doing their best.
A mathematics test of subtraction is an
assessment of maximal performance in that one is
motivated to subtract numbers as accurately as
one is able.
Cognitive ability and job knowledge tests are
also maximal performance measures.

8
Maximal Performance

In high stakes testing, such as employment
testing, people are motivated to do their best,
that is, to provide their maximal performance.
In high stakes testing, both those answering
honestly and those seeking to fake have the same
motivation Give the correct answer.
One can guess on a maximal performance test but
one cannot fake.

9
Maximal Performance

Maximal performance tests do not have faking
problems because the rules of the test (make
yourself look good by giving the correct answer)
and the rules of the testing situation (make
yourself look good by giving the correct answer)
are the same.

10
Typical Performance

In typical performance tests, the rules of the
test are to report how one typically behaves.
In personality tests, the instructions are
usually like this
Please use the rating scale below to describe how
accurately each statement describes you. Describe
yourself as you generally are now, not as you
wish to be in the future. Describe yourself as
you honestly see yourself.
Adapted from http//ipip.ori.org/newIPIPinstructio
ns.htm

11
Typical Performance

Thus, in a typical performance test, if one is
lazy and undependable, one is asked to report on
the test that one is lazy and undependable.
The rules of the test (describe how you typically
behave) contradict the rules of the testing
situation (make yourself look good by giving the
correct answer).
This contradiction makes faking likely.

12
Typical Performance

If one who is lazy and undependable, answers
honestly, one will do poorly on the test.
If one who is lazy and undependable fakes, the
respondent reports that they industrious and
dependable. The respondent who fakes will do
well on the test.
Example McDaniels messy desk

13
Typical Performance

Thus, one can improve ones score on a
personality test by ignoring the rules of the
test (describe how you typically behave) and by
following the rules of the testing situation
(make yourself look good by giving the correct
answer).

14
Typical Performance

On typical performance tests, it is easy to know
the correct responses
Dependable
Agreeable
Emotionally stable
Thus, it is easy to fake on typical performance
measures, such as personality tests, and one can
dramatically improve ones score through faking.

15
How much faking is there?

Over two-thirds (68) members of the Society for
Human Resource Management (SHRM) thought that
integrity tests were not useful because they were
susceptible to faking.
Rynes, Brown Colbert (2002)
Similarly, 70 of professional assessors believe
that faking is a serious obstacle to measurement.
Robie, Tuzinski Bly (2006)
These results suggest that there is frequent
faking in testing situations.

16
How much faking is there?

There is some emerging evidence that patterns
exist regarding the proportion of fakers in a
given sample.
Specifically, converging evidencethough
tentativeindicates that approximately 50 of a
sample typically will not fake, with most of the
rest being slight fakers, and a select few being
extreme fakers.

17
How much faking is there?

One study found that 30-50 of applicants
elevated their scores compared to later honest
ratings.
Griffeth et al. (2005)
There is also self-reported survey evidence that
65 of people say they would not fake an
assessment, with 17 unsure and 17 indicating
they would fake.
Rees Metcalfe (2003)
None of this is encouraging for practitioners,
because the presence of moderate numbers of
fakers, particularly small numbers of extreme
fakers, presents significant problems when
attempting to select the best applicants.
Komar (2008)

18
Personality tests are big business

Over a third of US corporations use personality
testing, and the industry takes in nearly 500
million in annual revenue.
Rothstein Goffin (2006)

19
Stop using personality tests?

The fact that applicants may be highly motivated
to fake in order to gain employment has raised
many questions as to the usefulness of
non-cognitive measures.
Some have even gone far enough to suggest that
personality measurement should not be used for
employee selection.
Murphy Dzieweczynski (2005)

20
But personality predicts

Personality tests predict important work
outcomes, such as job performance and training
performance.
Barrick, Mount Judge, 2001 Bobko, Roth
Potosky, 1999 Hough Furnham, 2003 Schmidt
Hunter, 1998.

21
Predict even with faking

Personality measures predict work outcomes, even
under conditions where faking is likely.
Rothstein and Goffin state that there are
abundant grounds for optimism that the
usefulness of personality testing in personnel
selection is not neutralized by faking (p. 166).

22
Faking still causes problems

Even though personality measures often produce
moderate predictive validities, there are a
number of other ways that faking can cause
problems, including
the construct validity of measures
changes in the rank-order of who is selected.

23
Evidence of faking
24
Evidence of faking

The concept of faking is relatively
straightforward
People engage in impression management and
actively try to make themselves appear to have
more desirable traits than they actually possess.
However, identifying actual faking behaviors in a
statistical sense has proven to be exceedingly
difficult.
Hough Oswald (2005)

25
Faking shows itself in various ways

Attempts to fake can show up in a number of
statistical indicators
test means
social desirability scales
criterion-related validity
actual or simulated hiring decisions
construct validity.
There is ample evidence that faking likely
influences most of these crucial test properties

26
Social desirability as faking

The construct of social desirability states that
the tendency to manage the impression one
maintains with others is a stable individual
difference that can be measured using a
traditional, Likert-style, self-report survey.
Paulhus John (1998)
Social desirability items are unlikely virtues,
that is, behaviors that we recognize as good but
that no one usually does
I have never been angry
I pick up trash off the street when I see it.
I am always nice to everyone

27
Social desirability as faking

Applicants for a job had higher social
desirability scores than incumbents, which was
interpreted as evidence that the applicants were
faking.
Rosse, Stecher, Miller, Levine (1998)
The initial view regarding social desirability
from an applied perspective was that it could be
measured in a selection context and used to
correct, or adjust, the non-cognitive scores
included in the test.

28
Social desirability as faking

Social desirability does not function as
frequently theorized.
A meta-analysis showed that social desirability
does not account for variance in the
personality-performance relationship.
Ones, Viswesvaran and Reiss (1996)
This means that knowledge of a persons level of
social desirability will not improve the
measurement of that persons standing on a
non-cognitive trait.

29
Social desirability as faking

Stated another way, this means that one cannot
correct a persons personality test score for
social desirability to improve prediction.
Applicants often fake in ways that are not likely
to be detected by social desirability scores.
Alliger, Lilienfeld Mitchell (1996) Zickar
Robie, (1999)
Summary Social desirability is a poor indicator
of applicant faking behavior.

30
Mean difference as faking

Faking is apparent when one compares responses of
groups of people who take a test under different
instructions
Test scores under fake-good instructions lead to
higher test means than scores under honest
instructions (d .6 across Big 5 personality
dimensions).
Viswesvaran Ones (1999)

31
Mean difference as faking

This pattern is similar when comparing actual
applicants and incumbents
The largest effects are found for the
traditionally most predictive personality
dimensions in personnel selection,
conscientiousness (d .45) and emotional
stability (d .44).
Birkeland, Manson, Kisamore, Brannick Smith
(2006)
Integrity test means shows the same pattern of
increased means in faking conditions (d .36 to
1.02).
Alliger Dwight (2000)

32
Mean difference as faking

Thus, people have the highest means in
experimental, fake-good designs and somewhat
lower means in applicant settings, and these
means are nearly always higher than
honest/incumbent conditions.
These are the most consistent findings in faking
research, and they are often taken as the most
persuasive evidence that faking occurs.

33
Mean difference as faking

Although the mean differences between faking and
honest groups permits one to conclude that faking
occurs, it is of little help in identifying which
applicants are faking.

34
Criterion-related validity and faking

Criterion-related validity is the correlation
between a test and an important work outcome,
such as job performance.
It is logical to assume that as applicants fake
more, the test will be less able to predict
important work outcomes.

35
Criterion-related validity and faking

Students conscientiousness ratings (measured
with personality and biodata instruments) were
much less predictive of supervisor ratings when
they completed the measures under fake-good
instructions.
Douglas, McDaniel and Snell (1996)
The general pattern in applied samples is
similar, as predictive validity is highest in
incumbent (supposedly honest) samples, slightly
lower for applicants, and drastically lower for
fake-good directions.
Hough, 1998
These findings are commonly interpreted as
supporting the hypothesis that faking may lower
criterion-related validity, but it often does not
do so drastically.

36
Criterion-related validity and faking

There are a number of caveats to this general
pattern regarding predictive validity.
One is situation strength when tests are
administered in ways that restrict natural
variation, criterion-related validity will drop.
Beatty, Cleveland Murphy (2001)
For example, if an organization clearly
advertises that it only hires the most
conscientious people, then applicants are more
likely to fake to appear more conscientious.

37
Criterion-related validity and faking

Another caveat is the number of people who fake.
A Monte Carlo simulation found that the best-case
scenario for faking is an all-or-nothing
proposition validity is retained with no fakers
or many fakers, but if there is a small minority
of fakers present, they are likely to be
rewarded, thus dragging overall test validity
down.
Komar, Brown, Komar Robie (2008)

38
Criterion-related validity and faking

A final caveat is that the criterion-related
validity of the test as a whole may not be
sensitive to changes in the rank-ordering of
applicants.
Komar et al., (2008)
This assumption was tested by rank-ordering
participants from two conditions (honest and
fake-good), and then dividing the distribution
into thirds.
Mueller-Hanson, Heggestad and Thornton (2003)
The results indicated that the top third, which
included a high percentage of participants who
were given faking instructions, had low validity
(r .07), while the bottom third produced high
validity (r .45).

39
Criterion-related validity and faking

Thus, a criterion-related validity study may show
that the test predicts job performance. However,
the test may not predict well for the top scoring
individuals because these are the individuals who
fake.

40
Selection decisions and faking

The last slide suggested that those who fake may
cluster at the top of the score list.
This introduces the topic of selection decisions
and faking.

41
Selection decisions and faking

It is a common finding that people who fake
identified by higher social desirability scores
or by higher proportions of those from a faking
condition will rise to the top of the selection
distribution and increase their probability of
being hired.
Mueller-Hanson et al. (2003) Rosse et al.,
(1998)
This situation worsens as the selection ratio is
lowered (fewer people are selected), because more
of them are likely to be fakers.

42
Selection decisions and faking

One study obtained applicant personality scores
and then honest scores one month later.
Out of 60 participants, one individual who was
ranked 4 for the applicant test dropped to 52
for the honest test, indicating a large amount of
faking.
Griffeth, Chmielowski and Yoshita (2005)

43
Selection decisions and faking

Numerous additional studies have provided similar
findings, suggesting that the rank order of
applicants will change considerably under
different motivational and instructional
conditions.
This pattern is usually attributed to faking
behavior, but it can also be partly explained by
random or chance variation.

44
Selection decisions and faking

People might score higher or lower on a second
test administration due to random factors (e.g.,
feeling ill).
Regardless, these consistent findings demand that
users of non-cognitive tests cannot simply rely
on a tests predictive validity to justify its
utility as a selection device.

45
Construct validity and faking

The construct validity of a test concerns the
internal structure and reliable relationships
with other variables.
Construct validity helps one to understand what
the test measures and what it does not.

46
Construct validity and faking

Construct validity is often overlooked in favor
of criterion-related validity.
However, construct validity is crucially
important regarding the quality of what is
measured.
Construct validity can also help us understand
faking.

47
Construct validity and faking

Factor analysis is a statistical method to help
determine constructs measured by a test.
Research indicates that construct validity does
indeed drop when faking is likely present.
The factor structure of non-cognitive tests,
especially personality, tends to degrade when
applicants are compared with incumbents, as an
extra factor often emerges with each item loading
on that factor in addition to loading on the
hypothesized factors.
Zickar Robie (1999) Cellar, Miller, Doverspike
Klawsky (1996)

48
Construct validity and faking

This means that the non-cognitive constructs
actually change under faking conditions, shedding
some doubt as to how similar they remain to the
intended, less-biased constructs.

49
Summary of Faking Studies

Applicants can fake and some do fake.
Evidence for faking can be seen in various types
of studies.
But there is no good technology for
differentiating the fakers from the honest
respondents.

50
Practical issues in test delivery
51
Properties of the selection system

Two key aspects of selection systems are
particularly relevant to the issue of faking
Multiple-hurdle vs. compensatory systems
Use and appropriate setting of cut scores

52
Properties of the selection system
Multiple-hurdle vs. compensatory systems

A multiple-hurdle system involves a series of
stages that an applicant must pass through to
ultimately be hired for the job.
This usually involves setting cut scoresa line
below which applicants are removed from the
poolat each step (or for each test in a
selection battery).

53
Properties of the selection system
Multiple-hurdle vs. compensatory systems

A compensatory system, on the other hand,
typically involves an overall score that is
computed for each applicant, meaning that a high
score for one test can compensate for a low score
on another.
Bott, OConnell, Ramakrishnan Doverspike, (2007)

54
Properties of the selection system
Multiple-hurdle vs. compensatory systems

A common validation procedure involves setting
cut scores based on incumbent data and then
applying that standard to applicants.
The higher means in applicant groups could result
in systematic bias in the cut scores.
Basically, since there is faking in applicant
samples, using the cut score determined from
incumbent data will result in too many applicants
passing the cut score.
Bott et al. (2007)

55
Properties of the selection system
Multiple-hurdle vs. compensatory systems

Personality tests may best be used from a
select-out versus the traditional select-in
perspective.
Mueller-Hanson et al. (2003)
This means that the non-cognitive measures
primary purpose would be to weed out the very
undesirable candidates rather than to identify
the applicants with the highest level of the
trait.
Dont hire the people who state that they are
lazy and undependable
But know that many of the people who score well
on the personality test are also lazy and
undependable
Thus, the goal of the personality test is to
reject those who are lazy and undependable and
willing to admit it.

56
Properties of the selection system
Multiple-hurdle vs. compensatory systems

Using a personality test or other non-cognitive
measures as a screen-out allows many more
applicants to pass the hurdle, thereby increasing
the potential cost of the system.
One still needs to screen the remaining
applicants.
Select-out may be a reasonable option under
conditions of
A high selection ratio (with many positions to
fill per applicant)
Or low cost per test administered (such as
unproctored internet testing).
Practitioners have to carefully consider and
justify how the setting of cut scores matches
with the goals and constraints of different
selection systems.

57
Situational judgment tests with knowledge
instructions

As noted in a previous presentation at this
conference, situational judgment tests can be
administered with knowledge instructions.
Knowledge instructions ask the applicants to
identify the best response or to rate all
responses for effectiveness

58
Situational judgment tests with knowledge
instructions

Knowledge instructions for situational judgment
tests should make them resistant to faking.
McDaniel, Hartman, Whetzel, Grubb (2007)
McDaniel Nguyen (2001) Nguyen, Biderman,
McDaniel (2005)

59
Situational judgment tests with knowledge
instructions

Although resistant to faking, these tests still
measure non-cognitive traits, specifically
Conscientiousness
Agreeableness
Emotional stability
McDaniel, Hartman, Whetzel, Grubb (2007)

60
Situational judgment tests with knowledge
instructions

Thus, situational judgment tests hold great
promise for measuring non-cognitive traits while
reducing, and perhaps eliminating, faking.
There are some limitations.

61
Situational judgment tests with knowledge
instructions

Limitations
It is hard to target a situational judgment test
to a particular construct
It is hard to build homogenous scales
With personality tests, one can easily build a
scale to measure conscientiousness and another to
measure agreeableness
Situational judgment tests seldom have clear
subtest scales

62
Faking and cognitive ability

The ability to fake may be related to cognitive
ability such that those who are more intelligent
can fake better.
The little literature on this is contradictory.
If faking is dependent on cognitive ability, then
faking should increase the correlation between
personality and cognitive ability.

63
Faking and cognitive ability

One advantage of non-cognitive tests is that they
show smaller mean differences across ethnic
groups.
If the ethnic group differences are due to mean
differences in cognitive ability, and if faking
increases the correlation between personality and
cognitive ability, faking should make the ethnic
group differences in personality larger.

64
Faking and cultural differences

Almost all faking research is done with U.S.
samples.
The prevalence of faking might be substantially
larger in other cultures.
For example, in cultures where bribery is a
common business practice, one might expect more
faking.

65
Potential solutions to faking
66
Social desirability scales

The literature is very clear that social
desirability scales do not help in identifying
fakers.
Statistical corrections based on social
desirability scales do not improve validity.
Ellingson, Sackett and Hough (1999)
Ones, Viswesvaran and Reiss (1996)
Schmitt Oswald (2006)

67
Frame of reference

The rationale behind frame of reference testing
is to design tests that encourage test takers to
focus on their behavior in a particular setting
(e.g., work).

68
Frame of reference

An example of frame of reference is the addition
of the phrase at work at the end of each items.
Typical item I am dependable
Frame of reference item I am dependable at work.

69
Frame of reference

There is some evidence that frame of reference
testing may increase validity.
Bing, Whanger, Davison VanHook (2004)
Hunthausen, Truxillo, Bauer Hammer (2003)
However, there is no evidence that frame of
reference testing reduces faking behavior.

70
Test instructions Coaching

If we want people to respond to our tests in a
certain way, we can simply tell them via test
instructions.
Coaching is one kind of instruction, usually in
the form of a vignette or example describing how
to approach an item in a socially desirable way.
Coaching predictably leads to faking behavior (as
evidenced by higher test means) and is certainly
a problem as advice to beat non-cognitive tests
circulates around the internet.

71
Test instructions Warning

Another popular strategy is to warn test takers
that they will be identified and removed from the
selection pool if they fake (known as a warning
of identification and consequences).

72
Test instructions Warning

A meta-analysis indicated that warnings generally
lower test means over standard instructions (d
.23), although there was considerable variability
in the direction and magnitude of effects in the
studies included.
Dwight and Donovan (2003)

73
Test instructions Warning

Problems
Warnings may increase the correlation between the
personality scales and cognitive ability.
Vasilopoulos et al. (2005)
Since one can not actually identify the fakers,
it is dishonest to warn test-takers that fakers
can indeed be identified.
Zickar Robie (1999)

74
Test instructions Warning

If most applicants heed the warning and do not
fake, those who do fake may more easily obtain
higher test scores.
Thus, warnings are admittedly an imperfect method
for combating faking, and more research is needed
to determine the extent of their utility.

75
Get data other than self-report

Personality and other non-cognitive constructs
are often evaluated for selection purposes
through ratings of others, including interviews
and assessment centers.
Approximately 35 of interviews explicitly
measure non-cognitive constructs, such as
personality and social skills, according to
meta-analytic evidence.
Huffcutt, Conway, Roth Stone (2001)

76
Get data other than self-report

Similarly, many common assessment center
dimensions involve non-cognitive aspects,
including communication and influencing others.
Arthur, Day, McNelly Edens (2003)

77
Get data other than self-report

Little faking and impression management research
has examined faking in interviews and assessment
centers.
However, it is logical that those who would fake
in a personality inventory would also fake in an
interview or an assessment center.

78
Forced-choice measures

Item 1Choose one item that is Most like you, and
one item that is Least like you

79
Forced-choice measures

Forced-choice measures differ from Likert-type
scales because they take equally desirable items
(desirability usually determined by independent
raters) and force the respondent to choose.
Forced-choice has costs
Abandoning the interval-level scale of
measurement
Abandoning the clearer construct scaling that
Likert measures offer.

80
Forced-choice measures

Whether the benefits of forced-choice formats,
such as potentially reducing faking, justify
these costs is questionable.
The effect of forced-choice on test means is
unclear, as some studies show higher means of
forced-choice compared with Likert measures and
others indicate lower means.
Heggestad, Morrison, Reeve McCloy, (2006)
Vasilopoulos, Cucina, Dyomina, Morewitz Reilly
(2006)

81
Forced-choice measures

Research on the effect of forced-choice on
selection decisions used items in both a
forced-choice and Likert format under
pseudo-applicant instructions (pretend you are
applying for a job).
Heggestad et al. (2006)

82
Forced-choice measures

They compared the rank-order produced by both
tests to an honest condition using a different
personality measure.
Results showed few differences in the rank-orders
between the measures, offering preliminary
evidence that forced-choice does not improve
selection decisions.
In summary, forced-choice tests do not
necessarily reduce faking, and the statistical
and conceptual limitations associated with their
use probably does not justify replacing
traditional non-cognitive test formats.

83
Recommendations for Practice
84
Avoid corrections

Little evidence exists that social desirability
scales or lie scales can identify faking.
Many tests include lie scales with instructions
about how to correct scores based on lie scales,
with the justification that corrections will
improve test validity.
Rothstein Goffin (2006)
There is no evidence to support this assertion,
rendering corrections a largely indefensible
strategy.

85
Specify how non-cognitive measures fit the goals
of the selection system

Given the consistent effect of faking on test
means, faking will affect cut scores and who is
selected in both compensatory and multiple hurdle
systems.
Bott et al. (2007)
Cut scores may have to be adjusted upward if they
are set based on incumbent scores.

86
Specify how non-cognitive measures fit the goals
of the selection system

The select-out strategy is an option.
Reject applicants who are willing to admit that
they are lazy and undependable
Screen the remaining applicants with a maximal
performance measure that is faking-free or
faking-resistant.
Select-out is a good strategy when the selection
ratio is high (i.e., you will hire most of those
who apply).

87
Recognize that criterion-related validity may say
little about faking

It is common to have a useful level of validity
for the test, when known faking is present.
However, the fakers are represented in greater
proportions at the high end of the test scores.
The validity may be much worse among these
applicants.

88
Manipulate the motivation of the applicants

If applicants are given information about the job
to which they are applying, they can fake their
scores toward that stereotype.
Mahar, Cologon Duck (1995)
On the other hand, if applicants are informed
about the potential consequences of poor fit,
which faking could realistically lead to during
the placement phase, they may be motivated to
respond more honestly, and initial research
indicates that this may be true.
Nordlund Snell (2006)

89
Conclusion

Non-cognitive tests can be faked.
Non-cognitive tests are faked.
There is no method to eliminate faking.
Consider using non-cognitive tests as select-out
screens

90
Conclusion

Use maximal performance tests (cognitive ability
and job knowledge) to screen those who remain.
Consider measuring non-cognitive traits with
faking-resistant situational judgment tests with
knowledge instructions.

91
References