Title: Review of Faking in Personnel Selection
1Review of Faking in Personnel Selection
Chris D. Fluckinger University of
Akron cdf12_at_uakron.edu
Deborah L. Whetzel Human Resources Research
Organization dwhetzel_at_humrro.org
- Michael A. McDaniel
- Virginia Commonwealth University
- mamcdani_at_vcu.edu
Prepared for International Workshop on Emerging
Frameworks and Issues for ST Recruitments Societ
y for Reliability Engineering, Quality and
Operations Management (SREQOM) Delhi, India
September, 2008
2- We note that Chris D. Fluckinger is the senior
author of our book chapter associated with this
conference. Although not present at the
conference, his contributions to this
presentation were substantial.
3Goal of this Presentation
- Provide practitioners and researchers with a
solid understanding of the practical issues
related to faking in test delivery and
assessment.
4Overview
- Typical vs. maximal performance
- The usefulness of different strategies to
identify faking - How faking creates challenges to test delivery
and measurement - Review and critique of common strategies to
combat faking
5Faking
- Faking is a conscious effort to improve ones
score on a selection instrument. - Faking has been described using various terms
including - Response distortion
- Social desirability
- Impression management
- Intentional distortion, and
- Self enhancement
- Hough, Eaton, Dunnette, Kamp, McCloy (1990)
Lautenschlager, (1994) Ones, Viswesvaran
Korbin (1995).
6Maximal vs. Typical Performance
- Faking can be understood by comparing the
distinction between maximal and typical
performance. - Cronbach, (1984)
- This distinction is useful in understanding
faking.
7Maximal Performance
- Maximal performance tests assess how respondents
perform when doing their best. - A mathematics test of subtraction is an
assessment of maximal performance in that one is
motivated to subtract numbers as accurately as
one is able. - Cognitive ability and job knowledge tests are
also maximal performance measures.
8Maximal Performance
- In high stakes testing, such as employment
testing, people are motivated to do their best,
that is, to provide their maximal performance. - In high stakes testing, both those answering
honestly and those seeking to fake have the same
motivation Give the correct answer. - One can guess on a maximal performance test but
one cannot fake.
9Maximal Performance
- Maximal performance tests do not have faking
problems because the rules of the test (make
yourself look good by giving the correct answer)
and the rules of the testing situation (make
yourself look good by giving the correct answer)
are the same.
10Typical Performance
- In typical performance tests, the rules of the
test are to report how one typically behaves. - In personality tests, the instructions are
usually like this - Please use the rating scale below to describe how
accurately each statement describes you. Describe
yourself as you generally are now, not as you
wish to be in the future. Describe yourself as
you honestly see yourself. - Adapted from http//ipip.ori.org/newIPIPinstructio
ns.htm -
11Typical Performance
- Thus, in a typical performance test, if one is
lazy and undependable, one is asked to report on
the test that one is lazy and undependable. - The rules of the test (describe how you typically
behave) contradict the rules of the testing
situation (make yourself look good by giving the
correct answer). - This contradiction makes faking likely.
12Typical Performance
- If one who is lazy and undependable, answers
honestly, one will do poorly on the test. - If one who is lazy and undependable fakes, the
respondent reports that they industrious and
dependable. The respondent who fakes will do
well on the test. - Example McDaniels messy desk
13Typical Performance
- Thus, one can improve ones score on a
personality test by ignoring the rules of the
test (describe how you typically behave) and by
following the rules of the testing situation
(make yourself look good by giving the correct
answer).
14Typical Performance
- On typical performance tests, it is easy to know
the correct responses - Dependable
- Agreeable
- Emotionally stable
- Thus, it is easy to fake on typical performance
measures, such as personality tests, and one can
dramatically improve ones score through faking.
15How much faking is there?
- Over two-thirds (68) members of the Society for
Human Resource Management (SHRM) thought that
integrity tests were not useful because they were
susceptible to faking. - Rynes, Brown Colbert (2002)
- Similarly, 70 of professional assessors believe
that faking is a serious obstacle to measurement. - Robie, Tuzinski Bly (2006)
- These results suggest that there is frequent
faking in testing situations.
16How much faking is there?
- There is some emerging evidence that patterns
exist regarding the proportion of fakers in a
given sample. - Specifically, converging evidencethough
tentativeindicates that approximately 50 of a
sample typically will not fake, with most of the
rest being slight fakers, and a select few being
extreme fakers.
17How much faking is there?
- One study found that 30-50 of applicants
elevated their scores compared to later honest
ratings. - Griffeth et al. (2005)
- There is also self-reported survey evidence that
65 of people say they would not fake an
assessment, with 17 unsure and 17 indicating
they would fake. - Rees Metcalfe (2003)
- None of this is encouraging for practitioners,
because the presence of moderate numbers of
fakers, particularly small numbers of extreme
fakers, presents significant problems when
attempting to select the best applicants. - Komar (2008)
18Personality tests are big business
- Over a third of US corporations use personality
testing, and the industry takes in nearly 500
million in annual revenue. - Rothstein Goffin (2006)
19Stop using personality tests?
- The fact that applicants may be highly motivated
to fake in order to gain employment has raised
many questions as to the usefulness of
non-cognitive measures. - Some have even gone far enough to suggest that
personality measurement should not be used for
employee selection. - Murphy Dzieweczynski (2005)
20But personality predicts
- Personality tests predict important work
outcomes, such as job performance and training
performance. - Barrick, Mount Judge, 2001 Bobko, Roth
Potosky, 1999 Hough Furnham, 2003 Schmidt
Hunter, 1998.
21Predict even with faking
- Personality measures predict work outcomes, even
under conditions where faking is likely. - Rothstein and Goffin state that there are
abundant grounds for optimism that the
usefulness of personality testing in personnel
selection is not neutralized by faking (p. 166).
22Faking still causes problems
- Even though personality measures often produce
moderate predictive validities, there are a
number of other ways that faking can cause
problems, including - the construct validity of measures
- changes in the rank-order of who is selected.
23Evidence of faking
24Evidence of faking
- The concept of faking is relatively
straightforward - People engage in impression management and
actively try to make themselves appear to have
more desirable traits than they actually possess.
- However, identifying actual faking behaviors in a
statistical sense has proven to be exceedingly
difficult. - Hough Oswald (2005)
25Faking shows itself in various ways
- Attempts to fake can show up in a number of
statistical indicators - test means
- social desirability scales
- criterion-related validity
- actual or simulated hiring decisions
- construct validity.
- There is ample evidence that faking likely
influences most of these crucial test properties
26Social desirability as faking
- The construct of social desirability states that
the tendency to manage the impression one
maintains with others is a stable individual
difference that can be measured using a
traditional, Likert-style, self-report survey. - Paulhus John (1998)
- Social desirability items are unlikely virtues,
that is, behaviors that we recognize as good but
that no one usually does - I have never been angry
- I pick up trash off the street when I see it.
- I am always nice to everyone
27Social desirability as faking
- Applicants for a job had higher social
desirability scores than incumbents, which was
interpreted as evidence that the applicants were
faking. - Rosse, Stecher, Miller, Levine (1998)
- The initial view regarding social desirability
from an applied perspective was that it could be
measured in a selection context and used to
correct, or adjust, the non-cognitive scores
included in the test.
28Social desirability as faking
- Social desirability does not function as
frequently theorized. - A meta-analysis showed that social desirability
does not account for variance in the
personality-performance relationship. - Ones, Viswesvaran and Reiss (1996)
- This means that knowledge of a persons level of
social desirability will not improve the
measurement of that persons standing on a
non-cognitive trait.
29Social desirability as faking
- Stated another way, this means that one cannot
correct a persons personality test score for
social desirability to improve prediction. - Applicants often fake in ways that are not likely
to be detected by social desirability scores. - Alliger, Lilienfeld Mitchell (1996) Zickar
Robie, (1999) - Summary Social desirability is a poor indicator
of applicant faking behavior.
30Mean difference as faking
- Faking is apparent when one compares responses of
groups of people who take a test under different
instructions - Test scores under fake-good instructions lead to
higher test means than scores under honest
instructions (d .6 across Big 5 personality
dimensions). - Viswesvaran Ones (1999)
31Mean difference as faking
- This pattern is similar when comparing actual
applicants and incumbents - The largest effects are found for the
traditionally most predictive personality
dimensions in personnel selection,
conscientiousness (d .45) and emotional
stability (d .44). - Birkeland, Manson, Kisamore, Brannick Smith
(2006) - Integrity test means shows the same pattern of
increased means in faking conditions (d .36 to
1.02). - Alliger Dwight (2000)
32Mean difference as faking
- Thus, people have the highest means in
experimental, fake-good designs and somewhat
lower means in applicant settings, and these
means are nearly always higher than
honest/incumbent conditions. - These are the most consistent findings in faking
research, and they are often taken as the most
persuasive evidence that faking occurs.
33Mean difference as faking
- Although the mean differences between faking and
honest groups permits one to conclude that faking
occurs, it is of little help in identifying which
applicants are faking.
34Criterion-related validity and faking
- Criterion-related validity is the correlation
between a test and an important work outcome,
such as job performance. - It is logical to assume that as applicants fake
more, the test will be less able to predict
important work outcomes.
35Criterion-related validity and faking
- Students conscientiousness ratings (measured
with personality and biodata instruments) were
much less predictive of supervisor ratings when
they completed the measures under fake-good
instructions. - Douglas, McDaniel and Snell (1996)
- The general pattern in applied samples is
similar, as predictive validity is highest in
incumbent (supposedly honest) samples, slightly
lower for applicants, and drastically lower for
fake-good directions. - Hough, 1998
- These findings are commonly interpreted as
supporting the hypothesis that faking may lower
criterion-related validity, but it often does not
do so drastically.
36Criterion-related validity and faking
- There are a number of caveats to this general
pattern regarding predictive validity. - One is situation strength when tests are
administered in ways that restrict natural
variation, criterion-related validity will drop. - Beatty, Cleveland Murphy (2001)
- For example, if an organization clearly
advertises that it only hires the most
conscientious people, then applicants are more
likely to fake to appear more conscientious.
37Criterion-related validity and faking
- Another caveat is the number of people who fake.
- A Monte Carlo simulation found that the best-case
scenario for faking is an all-or-nothing
proposition validity is retained with no fakers
or many fakers, but if there is a small minority
of fakers present, they are likely to be
rewarded, thus dragging overall test validity
down. - Komar, Brown, Komar Robie (2008)
38Criterion-related validity and faking
- A final caveat is that the criterion-related
validity of the test as a whole may not be
sensitive to changes in the rank-ordering of
applicants. - Komar et al., (2008)
- This assumption was tested by rank-ordering
participants from two conditions (honest and
fake-good), and then dividing the distribution
into thirds. - Mueller-Hanson, Heggestad and Thornton (2003)
- The results indicated that the top third, which
included a high percentage of participants who
were given faking instructions, had low validity
(r .07), while the bottom third produced high
validity (r .45).
39Criterion-related validity and faking
- Thus, a criterion-related validity study may show
that the test predicts job performance. However,
the test may not predict well for the top scoring
individuals because these are the individuals who
fake.
40Selection decisions and faking
- The last slide suggested that those who fake may
cluster at the top of the score list. - This introduces the topic of selection decisions
and faking.
41Selection decisions and faking
- It is a common finding that people who fake
identified by higher social desirability scores
or by higher proportions of those from a faking
condition will rise to the top of the selection
distribution and increase their probability of
being hired. - Mueller-Hanson et al. (2003) Rosse et al.,
(1998) - This situation worsens as the selection ratio is
lowered (fewer people are selected), because more
of them are likely to be fakers.
42Selection decisions and faking
- One study obtained applicant personality scores
and then honest scores one month later. - Out of 60 participants, one individual who was
ranked 4 for the applicant test dropped to 52
for the honest test, indicating a large amount of
faking. - Griffeth, Chmielowski and Yoshita (2005)
43Selection decisions and faking
- Numerous additional studies have provided similar
findings, suggesting that the rank order of
applicants will change considerably under
different motivational and instructional
conditions. - This pattern is usually attributed to faking
behavior, but it can also be partly explained by
random or chance variation.
44Selection decisions and faking
- People might score higher or lower on a second
test administration due to random factors (e.g.,
feeling ill). - Regardless, these consistent findings demand that
users of non-cognitive tests cannot simply rely
on a tests predictive validity to justify its
utility as a selection device.
45Construct validity and faking
- The construct validity of a test concerns the
internal structure and reliable relationships
with other variables. - Construct validity helps one to understand what
the test measures and what it does not.
46Construct validity and faking
- Construct validity is often overlooked in favor
of criterion-related validity. - However, construct validity is crucially
important regarding the quality of what is
measured. - Construct validity can also help us understand
faking.
47Construct validity and faking
- Factor analysis is a statistical method to help
determine constructs measured by a test. - Research indicates that construct validity does
indeed drop when faking is likely present. - The factor structure of non-cognitive tests,
especially personality, tends to degrade when
applicants are compared with incumbents, as an
extra factor often emerges with each item loading
on that factor in addition to loading on the
hypothesized factors. - Zickar Robie (1999) Cellar, Miller, Doverspike
Klawsky (1996)
48Construct validity and faking
- This means that the non-cognitive constructs
actually change under faking conditions, shedding
some doubt as to how similar they remain to the
intended, less-biased constructs.
49Summary of Faking Studies
- Applicants can fake and some do fake.
- Evidence for faking can be seen in various types
of studies. - But there is no good technology for
differentiating the fakers from the honest
respondents.
50Practical issues in test delivery
51Properties of the selection system
- Two key aspects of selection systems are
particularly relevant to the issue of faking - Multiple-hurdle vs. compensatory systems
- Use and appropriate setting of cut scores
52Properties of the selection system
Multiple-hurdle vs. compensatory systems
- A multiple-hurdle system involves a series of
stages that an applicant must pass through to
ultimately be hired for the job. - This usually involves setting cut scoresa line
below which applicants are removed from the
poolat each step (or for each test in a
selection battery).
53Properties of the selection system
Multiple-hurdle vs. compensatory systems
- A compensatory system, on the other hand,
typically involves an overall score that is
computed for each applicant, meaning that a high
score for one test can compensate for a low score
on another. - Bott, OConnell, Ramakrishnan Doverspike, (2007)
54Properties of the selection system
Multiple-hurdle vs. compensatory systems
- A common validation procedure involves setting
cut scores based on incumbent data and then
applying that standard to applicants. - The higher means in applicant groups could result
in systematic bias in the cut scores. - Basically, since there is faking in applicant
samples, using the cut score determined from
incumbent data will result in too many applicants
passing the cut score. - Bott et al. (2007)
55Properties of the selection system
Multiple-hurdle vs. compensatory systems
- Personality tests may best be used from a
select-out versus the traditional select-in
perspective. - Mueller-Hanson et al. (2003)
- This means that the non-cognitive measures
primary purpose would be to weed out the very
undesirable candidates rather than to identify
the applicants with the highest level of the
trait. - Dont hire the people who state that they are
lazy and undependable - But know that many of the people who score well
on the personality test are also lazy and
undependable - Thus, the goal of the personality test is to
reject those who are lazy and undependable and
willing to admit it.
56Properties of the selection system
Multiple-hurdle vs. compensatory systems
- Using a personality test or other non-cognitive
measures as a screen-out allows many more
applicants to pass the hurdle, thereby increasing
the potential cost of the system. - One still needs to screen the remaining
applicants. - Select-out may be a reasonable option under
conditions of - A high selection ratio (with many positions to
fill per applicant) - Or low cost per test administered (such as
unproctored internet testing). - Practitioners have to carefully consider and
justify how the setting of cut scores matches
with the goals and constraints of different
selection systems.
57Situational judgment tests with knowledge
instructions
- As noted in a previous presentation at this
conference, situational judgment tests can be
administered with knowledge instructions. - Knowledge instructions ask the applicants to
identify the best response or to rate all
responses for effectiveness
58Situational judgment tests with knowledge
instructions
- Knowledge instructions for situational judgment
tests should make them resistant to faking. - McDaniel, Hartman, Whetzel, Grubb (2007)
McDaniel Nguyen (2001) Nguyen, Biderman,
McDaniel (2005)
59Situational judgment tests with knowledge
instructions
- Although resistant to faking, these tests still
measure non-cognitive traits, specifically - Conscientiousness
- Agreeableness
- Emotional stability
- McDaniel, Hartman, Whetzel, Grubb (2007)
60Situational judgment tests with knowledge
instructions
- Thus, situational judgment tests hold great
promise for measuring non-cognitive traits while
reducing, and perhaps eliminating, faking. - There are some limitations.
61Situational judgment tests with knowledge
instructions
- Limitations
- It is hard to target a situational judgment test
to a particular construct - It is hard to build homogenous scales
- With personality tests, one can easily build a
scale to measure conscientiousness and another to
measure agreeableness - Situational judgment tests seldom have clear
subtest scales
62Faking and cognitive ability
- The ability to fake may be related to cognitive
ability such that those who are more intelligent
can fake better. - The little literature on this is contradictory.
- If faking is dependent on cognitive ability, then
faking should increase the correlation between
personality and cognitive ability.
63Faking and cognitive ability
- One advantage of non-cognitive tests is that they
show smaller mean differences across ethnic
groups. - If the ethnic group differences are due to mean
differences in cognitive ability, and if faking
increases the correlation between personality and
cognitive ability, faking should make the ethnic
group differences in personality larger.
64Faking and cultural differences
- Almost all faking research is done with U.S.
samples. - The prevalence of faking might be substantially
larger in other cultures. - For example, in cultures where bribery is a
common business practice, one might expect more
faking.
65Potential solutions to faking
66Social desirability scales
- The literature is very clear that social
desirability scales do not help in identifying
fakers. - Statistical corrections based on social
desirability scales do not improve validity. - Ellingson, Sackett and Hough (1999)
- Ones, Viswesvaran and Reiss (1996)
- Schmitt Oswald (2006)
67Frame of reference
- The rationale behind frame of reference testing
is to design tests that encourage test takers to
focus on their behavior in a particular setting
(e.g., work).
68Frame of reference
- An example of frame of reference is the addition
of the phrase at work at the end of each items. - Typical item I am dependable
- Frame of reference item I am dependable at work.
69Frame of reference
- There is some evidence that frame of reference
testing may increase validity. - Bing, Whanger, Davison VanHook (2004)
Hunthausen, Truxillo, Bauer Hammer (2003) - However, there is no evidence that frame of
reference testing reduces faking behavior.
70Test instructions Coaching
- If we want people to respond to our tests in a
certain way, we can simply tell them via test
instructions. - Coaching is one kind of instruction, usually in
the form of a vignette or example describing how
to approach an item in a socially desirable way. - Coaching predictably leads to faking behavior (as
evidenced by higher test means) and is certainly
a problem as advice to beat non-cognitive tests
circulates around the internet.
71Test instructions Warning
- Another popular strategy is to warn test takers
that they will be identified and removed from the
selection pool if they fake (known as a warning
of identification and consequences).
72Test instructions Warning
- A meta-analysis indicated that warnings generally
lower test means over standard instructions (d
.23), although there was considerable variability
in the direction and magnitude of effects in the
studies included. - Dwight and Donovan (2003)
73Test instructions Warning
- Problems
- Warnings may increase the correlation between the
personality scales and cognitive ability. - Vasilopoulos et al. (2005)
- Since one can not actually identify the fakers,
it is dishonest to warn test-takers that fakers
can indeed be identified. - Zickar Robie (1999)
74Test instructions Warning
- If most applicants heed the warning and do not
fake, those who do fake may more easily obtain
higher test scores. - Thus, warnings are admittedly an imperfect method
for combating faking, and more research is needed
to determine the extent of their utility.
75Get data other than self-report
- Personality and other non-cognitive constructs
are often evaluated for selection purposes
through ratings of others, including interviews
and assessment centers. - Approximately 35 of interviews explicitly
measure non-cognitive constructs, such as
personality and social skills, according to
meta-analytic evidence. - Huffcutt, Conway, Roth Stone (2001)
76Get data other than self-report
- Similarly, many common assessment center
dimensions involve non-cognitive aspects,
including communication and influencing others. - Arthur, Day, McNelly Edens (2003)
77Get data other than self-report
- Little faking and impression management research
has examined faking in interviews and assessment
centers. - However, it is logical that those who would fake
in a personality inventory would also fake in an
interview or an assessment center.
78Forced-choice measures
- Item 1Choose one item that is Most like you, and
one item that is Least like you
79Forced-choice measures
- Forced-choice measures differ from Likert-type
scales because they take equally desirable items
(desirability usually determined by independent
raters) and force the respondent to choose. - Forced-choice has costs
- Abandoning the interval-level scale of
measurement - Abandoning the clearer construct scaling that
Likert measures offer.
80Forced-choice measures
- Whether the benefits of forced-choice formats,
such as potentially reducing faking, justify
these costs is questionable. - The effect of forced-choice on test means is
unclear, as some studies show higher means of
forced-choice compared with Likert measures and
others indicate lower means. - Heggestad, Morrison, Reeve McCloy, (2006)
Vasilopoulos, Cucina, Dyomina, Morewitz Reilly
(2006)
81Forced-choice measures
- Research on the effect of forced-choice on
selection decisions used items in both a
forced-choice and Likert format under
pseudo-applicant instructions (pretend you are
applying for a job). - Heggestad et al. (2006)
82Forced-choice measures
- They compared the rank-order produced by both
tests to an honest condition using a different
personality measure. - Results showed few differences in the rank-orders
between the measures, offering preliminary
evidence that forced-choice does not improve
selection decisions. - In summary, forced-choice tests do not
necessarily reduce faking, and the statistical
and conceptual limitations associated with their
use probably does not justify replacing
traditional non-cognitive test formats.
83Recommendations for Practice
84Avoid corrections
- Little evidence exists that social desirability
scales or lie scales can identify faking. - Many tests include lie scales with instructions
about how to correct scores based on lie scales,
with the justification that corrections will
improve test validity. - Rothstein Goffin (2006)
- There is no evidence to support this assertion,
rendering corrections a largely indefensible
strategy.
85Specify how non-cognitive measures fit the goals
of the selection system
- Given the consistent effect of faking on test
means, faking will affect cut scores and who is
selected in both compensatory and multiple hurdle
systems. - Bott et al. (2007)
- Cut scores may have to be adjusted upward if they
are set based on incumbent scores.
86Specify how non-cognitive measures fit the goals
of the selection system
- The select-out strategy is an option.
- Reject applicants who are willing to admit that
they are lazy and undependable - Screen the remaining applicants with a maximal
performance measure that is faking-free or
faking-resistant. - Select-out is a good strategy when the selection
ratio is high (i.e., you will hire most of those
who apply).
87Recognize that criterion-related validity may say
little about faking
- It is common to have a useful level of validity
for the test, when known faking is present. - However, the fakers are represented in greater
proportions at the high end of the test scores. - The validity may be much worse among these
applicants.
88Manipulate the motivation of the applicants
- If applicants are given information about the job
to which they are applying, they can fake their
scores toward that stereotype. - Mahar, Cologon Duck (1995)
- On the other hand, if applicants are informed
about the potential consequences of poor fit,
which faking could realistically lead to during
the placement phase, they may be motivated to
respond more honestly, and initial research
indicates that this may be true. - Nordlund Snell (2006)
89Conclusion
- Non-cognitive tests can be faked.
- Non-cognitive tests are faked.
- There is no method to eliminate faking.
- Consider using non-cognitive tests as select-out
screens
90Conclusion
- Use maximal performance tests (cognitive ability
and job knowledge) to screen those who remain. - Consider measuring non-cognitive traits with
faking-resistant situational judgment tests with
knowledge instructions.
91References
- References are in the book chapter.
92