Statistics for Proficiency Testing - PowerPoint PPT Presentation

1 / 85
About This Presentation
Title:

Statistics for Proficiency Testing

Description:

Add qualitative and ordinal data. Harmonize with GUM and VIM. Other? from Seminar? ... or categorical), or semi-quantitative values (ordinal) are not in ISO 13528 or ... – PowerPoint PPT presentation

Number of Views:3185
Avg rating:4.0/5.0
Slides: 86
Provided by: danth8
Category:

less

Transcript and Presenter's Notes

Title: Statistics for Proficiency Testing


1
Statistics for Proficiency Testing
  • Hong Kong Accreditation Service
  • 10 September, 2009
  • Daniel Tholen, M.S.

2
Overview of Statistical Methods
  • Requirements for statistical methods from ISO/IEC
    17043
  • Overview of statistical procedures in the major
    standards
  • Determining the assigned value
  • Determining the performance score
  • Checking homogeneity and stability
  • Examples from ISO, APLAC PT, class

3
Overview of Modules 2-4
  • Cover all areas of PT
  • Chemical testing
  • Medical testing
  • Calibration
  • Concepts and procedures cover inspection
  • Practical application with examples from PT
    projects.

4
Documents for PT Statistics
  • ISO/IEC FDIS 17043 Conformity Assessment
    General requirements for Proficiency Testing (was
    Guide 43-1)
  • ISO 13528 Statistical Methods for use in
    Proficiency Testing by Interlaboratory
    Comparisons
  • IUPAC Harmonized Protocol for PT of Chemical
    Analytical Laboratories, 2006
  • Previous APLAC Statistical procedures

5
ISO/IEC 17043 Annex B
  • Same basic methods as ISO/IEC Guide 43-1 Annex A
    on Statistical Methods
  • Adds considerations for semi-quantitative and
    categorical data
  • Main topics
  • Determine Assigned Value
  • Calculate Performance Statistic
  • Evaluate Performance
  • Determination of Homogeneity and Stability

6
17043 Annex B
  • References ISO 13528 (2005) and IUPAC Harmonized
    Protocol (2006)
  • Adds considerations for GUM

7
ISO 13528
  • Companion to ISO Guide 43-1, Annex A
  • Written as a Standard
  • High interest / widely used
  • Goal is to describe optimal procedures, but to
    allow other procedures as long as they are
  • Statistically valid
  • Fully described to participants

8
ISO 13528
  • Written by ISO TC69, SC6
  • Approved work item in 1997
  • Published in 2005
  • Reaffirmed in 2009
  • Proposal to revise for ISO/IEC 17043
  • Correction for discovered gaps
  • Add qualitative and ordinal data
  • Harmonize with GUM and VIM
  • Other? from Seminar?

9
Reporting considerations ISO 13528, section 4.6
  • Possible conflicts with requirement for
    laboratories to treat and report PT same as for
    customer
  • NO TRUNCATED RESULTS (!!??)
  • Less than values not allowed
  • Possible resolution
  • Restriction only applies to consensus

10
Reporting considerations ISO 13528, section 4.6
  • Rounding
  • Independently estimate typical repeatability sr
  • Do not round digits by more than sr/2
  • Number of replicates
  • Concern for getting accurate estimate of bias
  • When a methods repeatability is large, it can
    confuse interpretation of scores
  • Determine number n of replicates so that
  • sr /vn lt 0.3sP

11
Limiting the effect of repeatability Example
  • Say sP5 and sr 2
  • Then sr /vn lt 0.3sP
  • So 2/vn lt 0.3(5) or
  • 2/1.5 lt vn ? (1.77)2 lt n ? 3.13 lt n
  • Or n4 replicates
  • This criterion can lead to large n replicates.

12
IUPAC Harmonized Protocol (2006)
  • Available free at IUPAC website
  • Revision of 1996 Version
  • Update to ISO 13528
  • Revises selected portions
  • Homogeneity criteria
  • Detection of bimodality (kernal analysis)
  • Strong opinions

13
APLAC PT Committee
  • Historically followed NATA (Australia) procedures
  • Convention gradually changing to ISO 13528 and
    IUPAC
  • No standard procedures
  • Statistically valid
  • Explained to participants

14
(No Transcript)
15
Requirements for Statistical Methods ISO/IEC
17043
  • 4.4.1.4 Access to the necessary technical
    expertise, including statistics
  • 4.4.3.2 Procedures for ensuring homogeneity and
    stability in accordance with appropriate
    statistical designs, including random selection
    of items

16
Statistical Methods in 17043 4.4.4 Statistical
Design
  • 4.4.4.1 Designs shall meet the objectives of the
    scheme, based on the nature of the data
  • NOTE 1 Covers the process of planning,
    collection, analysis and reporting
  • NOTE 2 Data analysis methods could vary from the
    very simple to the complex
  • NOTE 3 Statistical design and data analysis
    methods can be taken directly from specifications
    by regulatory agencies or customers.
  • NOTE 4 In the absence of reliable information, a
    preliminary interlaboratory comparison can be
    used.
  • .

17
Statistical Design 4.4.4
  • 4.4.4.2 Documented statistical design and data
    analysis methods to identify the assigned value
    and evaluate participant results. Demonstrate
    that the statistical assumptions are reasonable.

18
Statistical Design 4.4.4
  • 4.4.4.3 Give careful consideration to the
    following
  • the accuracy and measurement uncertainty required
    or expected
  • the minimum number of participants
  • number of significant figures and decimal places
  • number of proficiency test items and repeat
    tests
  • procedures to establish evaluation criteria
  • procedures to be used to handle outliers
  • procedures for the evaluation of excluded values
  • the objectives to be met for the design and
    frequency of proficiency testing rounds.

19
ISO/IEC 17043 Other
  • 4.4.4.5 Requirements for assigned values
    (traceability and uncertainty) calibration,
    testing, consensus
  • 4.7.1 Data analysis
  • 4.7.2 Evaluation of performance
  • 4.8 Reports
  • including summary statistics for methods used by
    other participants

20
ISO 17043 Annex B
  • B.1 General
  • Many types of PT, many types of data
  • Reference ISO 13528 and IUPAC Harmonized Protocol
  • Note that ISO 13528 allows other techniques if
    they are statistically valid and explained to
    participants

21
ISO 17043 Annex B
  • B.2 Determining the assigned value and its
    uncertainty
  • Definition
  • 3.1 assigned value
  • value attributed to a particular property of a
    proficiency test item

22
B.2 Determining the assigned value
  • Various procedures available listed below in an
    order of increasing uncertainty
  • known values by formulation (e.g. manufacture
    or dilution)
  • certified reference values (for quantitative
    tests)
  • reference values
  • consensus values from expert participants
  • consensus values from participants.

23
Determining the assigned value
  • Procedures for qualitative data (nominal or
    categorical), or semi-quantitative values
    (ordinal) are not in ISO 13528 or the IUPAC
    Harmonized Protocol.
  • Generally determined by expert judgment or
    manufacture.
  • May use a consensus value, such as agreement of a
    predetermined percentage of responses (e.g., 80)
  • May use median or mode, not mean

24
Determining the assigned value
  • No such thing as standard deviation
  • IT IS NOT APPROPRIATE to calculate the mean or SD
    of semi-quantitative values.

25
Example Semi-Quantitative
  • Measurand Level of reaction, by category
  • 1 no reaction, normal
  • 2 mild reaction
  • 3 moderate reaction
  • 4 severe reaction
  • 2 PT samples, A and B
  • 50 participants

26
Example Semi-Quantitative
  • Sample A
  • 1 20 results (40)
  • 2 18 results (36)
  • 3 10 results (20)
  • 4 2 results (4)
  • Sample B
  • 1 8 results (16)
  • 2 12 results (24)
  • 3 20 results (40)
  • 4 10 results (20)

27
Responses for Samples A and B
28
Determining the assigned value
  • Other considerations
  • If consensus, control outliers
  • If consensus, check trueness of process
  • Criteria for acceptability on the basis of
    uncertainty of the assigned value (for all a.v.,
    especially consensus)

29
ISO 13528 Procedures
  • Calculate Summary Statistics
  • Outlier detection and removal are allowed if done
    in a statistically valid way
  • Robust measures are preferred
  • Mean
  • SD
  • Preferred robust method is given, others are
    allowed if
  • Statistically valid
  • Fully described to participants

30
ISO 13528 Procedures
  • Determine Assigned Value
  • Determined before PT shipment
  • Result from formulation
  • Certified reference value
  • Other reference values
  • Determined from PT data
  • Consensus of experts
  • Consensus of participants
  • Control the uncertainty of the assigned value

31
13528 - Robust Analysis
  • Algorithm A for mean and SD
  • Starts with xmedian
  • s1.483xmedianxi-x
  • Limit data at x1.5s and x-1.5s
  • Extreme values trimmed to 1.5s
  • Option to use initial x and s and skip
    iterations
  • (Per NATA and many APLAC studies)

32
13528 - Robust Analysis
  • Calculate new x(Sxi)/p
  • s1.134vS(xi-x)2/(p-1)
  • Trim data again, at 1.5s
  • Recalculate new x and s
  • Repeat until convergence

33
13528 Quality check Section 5.7
  • When AV is determined prior to PT
  • Compare AV with robust mean or results
  • Determine uncertainty of comparison ud
  • If difference exceeds 2ud then investigate
  • When AV is determined from consensus
  • Compare AV with a reference value from a
    competent laboratory (could come from homogeneity
    data)
  • Compare robust SD with experience

34
Determine the Standard Uncertainty of the
Assigned Value
  • Determined before PT shipment
  • Result from formulation
  • Uncertainty per manufacture process, usually very
    small relative to measurement uncertainty
  • Certified reference value
  • Uncertainty provided with certificate
  • Other reference values
  • Uncertainty calculated per GUM or other procedure

35
Determine the Standard Uncertainty of the
Assigned Value
  • Determined from data in PT shipment
  • Consensus of expert laboratories (p of them)
  • Each lab should know their MU, and report it
  • ux 1.25(v(Sui2))/p for robust mean (median)
  • Caution about bias in experts
  • Consensus of participants (p of them)
  • Calculate robust mean and SD (s)
  • ux 1.25(s)/vp
  • Caution about bias due to method mix
  • Caution about lack of consensus

36
Limiting the uncertainty of the Assigned Value
(X) 13528 Section 4.2
  • Establish limits for uncertainty of AV
  • u(X) lt 0.3sP
  • When using fixed limits (E)
  • u(X) lt 0.3(E/3)
  • u(X) lt E/10

37
Limiting the uncertainty of the Assigned Value
(X) 13528 Section 4.2
  • If this cannot be met then
  • Look for a better way to determine AV
  • Incorporate uncertainty in score
  • z
  • En
  • zeta
  • Advise participants of large uncertainty

38
(No Transcript)
39
Limiting the uncertainty of the Assigned Value
(X) Example
  • When consensus mean and SD are used to determine
    performance
  • Then u(X) SD/vn
  • So one can have very small uncertainty with large
    number of labs.
  • What n is needed to assure criterion is met?

40
Limiting the uncertainty of the Assigned Value
(X) Example
  • What n is needed to assure criterion is met?
  • Need 1/vn lt 0.3, or n gt (1/0.3)2 or n12
  • If n11, then cannot meet criterion.
  • If robust mean is used ngt 1.25 (11) or
  • n 14

41
B.3 Calculating performance statistics
  • Quantitative results
  • D and D
  • Z, z
  • En, Zeta
  • Qualitative/semi-quantitative results
  • Combined performance scores

42
Calculate Performance Statistic
  • Estimates of bias
  • Difference D(x-X)
  • Percentage Difference D100(x-X)/X
  • D and D can be evaluated with Fixed Limits
  • Estimates of Relative Performance
  • rank or percentage rank (not recommended)
  • z score (recommended) z(x-X)/s

43
Determine Performance Interval
  • Fixed Limits (or Fitness for Purpose)
  • Can come from methods for SD
  • Not widely used
  • Preferred for interpretation
  • Fixed percentage across range
  • Fixed value across range
  • Mixed or segmented.

44
SD for Proficiency Assessment
  • SD for proficiency sP
  • 5 ways to get SD for Proficiency (for z scores)
  • By prescription (set by Accreditor or advisors)
  • By experience (perception) of experts
  • From a general model (e.g.,Horwitz)
  • By a precision experiment (ISO 5725-2)
  • From participant data (robust SD)
  • Should be chosen as fitness for purpose, under a
    common model for all analytes

45
SD for proficiency testing
  • Discussed in detail in section 6 of 13528
  • SD as used in z scores
  • Can also be thought of as 1/3 of evaluation
    interval
  • (when zgt3 is action signal)
  • For example if fixed interval is E 10...
  • Then E 3 sP
  • sP E/3 10/3 3.3

46
Scores that use uncertainty
  • En and zeta consider uncertainty of participant
    result and assigned value
  • Requires consistent determination of uncertainty
    by all laboratories
  • En in common use in calibration
  • z uses uncertainty of assigned value only
  • Useful when too much uncertainty in assigned
    value.
  • Same as z when small uncertainty

47
Scores that use uncertainty
  • En score (Error, normalized)
  • En (x-X)/v(U2labU2ref)
  • z scores (like z, includes ux)
  • z (x-X)/v(s2u2X)
  • zeta scores (like En, but with std. uncertainty)
  • Zeta (x-X)/v(u2labu2ref)

48
Evaluate performance
  • Compare performance statistic against criteria,
    determine acceptability i.e.,
  • For fixed limits
  • Bias lt Limit ? acceptable
  • Bias Limit ? unacceptable

49
Evaluate performance
  • Compare performance statistic against criteria,
    determine acceptability i.e.,
  • For z z zeta
  • -2lt z lt2 ? acceptable
  • -3lt z -2 or 2 z lt3 ? warning signal
  • z -3 or z 3 ? unacceptable

50
Evaluate performance
  • En lt1 ? acceptable
  • En 1 ? unacceptable

51
Combined performance scores
  • Analyze data for each item independently
  • Special process for Youden pairs
  • Can be other reasons to combine results
  • Precision
  • Linearity
  • Can count number of satisfactory scores
  • Not recommended to combine performance scores
    (such as average z)

52
Graphic Reports for PT round
  • Rank vs. Result (with or without MU)
  • Used to check Normal distribution
  • Used to visualize data

53
(No Transcript)
54
(No Transcript)
55
Graphic Reports for PT round
56
Graphic Reports for PT round
  • Histograms of results or scores

57
(No Transcript)
58
(No Transcript)
59
Combined Performance Scores
  • Generally discouraged in 17043 and 13528
  • Can miss problem on one sample or measurand
  • OK only for statistics that have the same
    distribution (rare) sometimes true for
    performance scores.

60
Graphic Reports for PT round
  • Bar plot of standardized performance statistics
    (z h k)
  • z, or other standardized scores ( error)
  • h and k plots from 5725
  • h same as z, except always from sample SD
  • k for repeatability (n2 replicates)

61
(No Transcript)
62
Graphic Reports for PT round
  • Youden plot (usually w/median lines)
  • In this document uses only z scores.
  • Should use sample results, for clarity
  • Provides evidence of related results, which can
    suggest consistent bias
  • Consistent bias can suggest lack of clearly
    defined method.
  • Confirm with rank correlation test.

63
(No Transcript)
64
Youden Plot Example
65
(No Transcript)
66
(No Transcript)
67
Graphic Reports for More than One PT Round
  • Line plot (Shewhart plot) for scores on previous
    rounds
  • Use any standardized score
  • Show evaluation intervals
  • Show test dates

68
(No Transcript)
69
Graphic Reports for more than One PT Round
  • Dot Plot
  • Show all samples on same chart
  • Show evaluation intervals
  • Show dates or scheme codes

70
(No Transcript)
71
Graphic Reports for More than One PT Round
  • CUSUM control chart
  • Can show trends affecting bias
  • Choose some number to use (rolling sum)
  • Sums should trend to zero
  • Not sensitive to current problems

72
(No Transcript)
73
Graphic Reports for More than One PT Round
  • Plot of Standardized Laboratory Biases against
    assigned value
  • Shows relationship between score and level
  • Can mask time effectdo both

74
(No Transcript)
75
Demonstration of homogeneity and stability in
17043
  • Ensure sufficient homogeneity so as to not impact
    evaluation of performance
  • Different needs for determining HS in PT and in
    for Reference Materials (ISO Guides 34 and 35)
  • PT (and RM) needs to ensure sufficient
  • CRM needs to estimate SD between samples, and
    instability as part of uncertainty of assigned
    value

76
Homogeneity - 13528
  • Homogeneity
  • 10 or more samples, 2 replicates
  • SDS for samples (ANOVA calculation)
  • SDS lt 0.3 sP
  • No F test
  • Can use experience to reduce testing
  • When evidence and theory prove homogeneous

77
Homogeneity IUPAC (2006)
  • Similar to 13528, larger criterion for
    acceptance, more complex statistics.
  • 10 or more samples, in duplicate
  • Sufficient repeatability san lt 0.5sp
  • Cochran test for duplicates
  • Visual check for anomalies
  • Non-random differences between replicates
  • Time trend across manufacture

78
Homogeneity IUPAC (2006)
  • Calculate variances
  • S2an (between replicates)
  • S2sam (between samples)
  • s2all (0.3sp)2
  • Calculate acceptance criterion
  • Take F1 and F2 from Tables
  • c F1s 2all F2s2an
  • If S2sam lt c then acceptable homogeneity
  • Since F1gt0 and s2angt0 and s2all 13528
    criterion, this is always an easier criterion

79
Homogeneity - traditional
  • F test (allowed, not recommended)
  • F (SDS2/sr2)
  • Sr repeatability SDS ANOVA treatment
  • Fcrit F(.05,k-1, s(n-1)) k samples n
    replicates
  • High Sr ?insensitive test (large SDS passes)
  • Low Sr ?too sensitive test (small SDS fails)

80
Stability - 13528
  • Stability
  • Analysis on or after closing date
  • (2-)3 samples, (1-)2 replicates, depending on
    experience
  • Calculate overall mean
  • Mean(H) Mean(S) lt 0.3 sP
  • No statistical t test
  • High Sr ?insensitive test (big difference passes)
  • Low Sr ?too sensitive test (small difference
    fails)

81
Stability - practical
  • Can use experience and technical knowledge
    (backed by data)
  • Same measurand, same manufacture process, same
    matrix
  • For calibration artefact, homogeneity and
    stability are usually the same

82
APLAC (NATA) Robust procedure
  • Calculate Quartiles Q1, median, Q3
  • IQR Q3-Q1
  • Median is an estimate of mean
  • Normalized IQR is an estimate of SD
  • IQRN 0.7413 x IQR

83
APLAC performance statistics
  • Calculate relative performance measures
  • Between lab agreement
  • Si (AiBi)/v2
  • Within lab agreement
  • Di (Ai-Bi)/v2 if median (Ai)gtmedian(Bi)
  • (Bi-Ai)/v2 if median(Ai)ltmedian(Bi)
  • Calculate z-scores for these measures

84
(No Transcript)
85
The End
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com