Title: Identification Numbers and Error Detection
1Identification Numbers and Error Detection
2Where do you find identification numbers?
- Checks
- Credit cards
- Drivers licenses
- VIN Numbers
- Zip Codes
- SSN
- ISBN Numbers
- Bar Codes/UPCs (Universal Product Codes)
3Check Digits
- Used to reduce error
- Grocery items, credit cards, overnight mail,
magazines, personal checks, travelers checks,
soft drink cans, and automobiles have check
digits (COMAP)
4Congruence mod n
- a is congruent to b, mod n, if n divides
- a-b (Stillwell, Elements of Number Theory)
- For example, 27 is congruent to 6 mod 3 because
27-621, which is divisible by three. - Similarly, a number is congruent (mod n) to its
remainder after being divided by n for example,
20/36 R2, so 202 (mod 3)
5Money Order
- COMAP example- The identification number on a
money order is 63024383845. - The last digit is the check digit, so the first
10 digits should be 5 mod 9. - How do we divide such a large number by 9?
- How do we know this?
6Calculating Divisibility Rules
- Take a number zabcdefg
- This could also be written as z g100 f101
e102 d103 c104 b105 a106 - Lets replace each 10n by its congruence mod 9
7Divisibility by 9 continued
- 10011 (mod 9)
- 101101 (mod 9)
- 102111 (mod 9)
- By analogy, we see that every place will have a
weight of 1, so a(1)b(1)c(1)d(1)e(1)f(1)g(1)
leaves the same remainder (mod 9) as abcdefg, so
this is our divisibility rule.
8Divisibility by 11
- Well proceed in the same way
- 1001 mod 11
- 101-1 mod 11
- 102 -1-11 mod 11
- 103 1021011-1-1 mod 11
- In this way, we see that abcdefg(geca)-(fdb)
(mod 11) - Ex. 3458679(9653)-(784)4 (mod 11), so is
not divisible by 11.
9Money Order Revisited
- If the check digit only checks the divisibility
of the sum, what can be overlooked?
10Potential Problems with the Check Digit
- Any number can be replaced with its congruence
mod n (so 0 can be replaced with 9 in this
example) - Also, any digits can be easily transposed.
11Other Uses of the Check Digit
- Travelers checks and Euro banknotes use the
check digit, but the entire number, including the
check digit, should be divisible by 9. - If you know the divisor rule, you can easily
figure out what the check digit should be. For
example, if a travelers check ID number is
3487956321 and it has a divisor rule of 9, what
should the check digit be?
12Answer
- The sum of the digits is 48, and 48654, which
is divisible by 9, so the check digit should be 6.
13More Sophisticated- UPCs
- A 12-digit number that is found along the bottom
of a barcode - A BBBBB CCCCC D
- A is the type of good, B is the
manufacturers code, C is the product code, D
is the check digit (COMAP)
14UPC Error Prevention
- Take the code A BCDEF GHIJK L. Going from left
to right, calculate 3AB3CD3EF3GH3IJ3KL.
If it isnt divisible by 10, it is an incorrect
UPC. - This detects all single-position errors and
about 89 of other errors (COMAP 326). - Try this on your own!
15The U.S. Banking System
- To identify a certain bank, it has an
identification number (the first string on the
bottom of the check). The string is ABCDEFGH,
and the check digit is I. I must be the last
digit of the resulting sum of 7A3B9C7D3E9F7G
3H
16Here, we can see that 7(1)3(2)9(1)0007(2)3(
4) 48. Since 8 is the check digit, this is a
valid number.
http//blog.wellsfargo.com/GuidedByHistory/images/
Earth_Check_large.jpg
17Credit Card Numbers
- A more effective algorithm, which is used for
credit cards, is called Codabar, and is also used
by libraries, blood banks, photofinishing
companies, German banks, and the South Dakota
drivers license department (COMAP 327). - Codabar allows computers to detect 100 of
single-position errors and about 98 of other
common errors (COMAP 328). - The Codabar algorithm was developed by Hans Peter
Luhn (1896-1964) and was patented in 1960
(http//www.merriampark.com/anatomycc.htm).
18The Algorithm
- Assume a 16-digit card number, with the final
number being the check digit. Add every digit in
the odd-numbered spaces (going from left to
right) and multiply by two. - Then add the number of digits in odd-numbered
spaces that gt4. - Finally, add all the digits in the even-numbered
spaces (except for the check digit). - The check digit will need to be whatever will
make the total divisible by 10. - Ex. What is the check digit for card number
124785943967210?
19Solution
- Ex. What is the check digit for card number
124785943967210? - 1489362033266
- 66369
- 692754971 104
- 1046110, so the check digit is 6.
20ISBN Numbers
- The 10-digit International Standard Book Number
detects 100 of single errors and 100 of
transposition errors (COMAP 328) - An ISBN of A-BCDE-FGHI-J, with J as the check
digit, is valid if 10A9B8C7D6E5F4G3H2IJ
is divisible by 11.
21ISBNs continued
- Example 0-387-95587-9
- 10(0)9(3)8(8)7(7)6(9)5(5)4(5)3(8)2(7)928
6 - 286 is divisible by 11 because (62)-80, which
is divisible by 11.
22Why does this work every time?
- COMAP proof
- Let there be an error in the B slot called B.
- Then both calculations (with and without errors)
must be divisible by 11 in order for B to not be
detected. Then the difference between the
calculations must be divisible by 11.
23Proof continued
- So (10A9B8C7D6E5F4G3H2IJ)
(10A9B8C7D6E5F4G3H2IJ) 9(B-B). - Since B,B lt9, B-B cannot be 11, and so 9(B-B)
cannot be divisible by 11 unless BB.
24Another Example
- What is the check digit for the ISBN 0-7167-1910?
25Solution
- 10(0)9(7)8(1)7(6)6(7)5(1)4(9)3(1)2(0)
199 - The next number divisible by 11 is 19910209.
- To represent 10, ISBN numbers have an X.
26Code 39
- This uses the digits 0-9 and letters A-Z (which
correspond to 10-35) - Code 39 is used by the DoD, automotive companies,
and the health industry (COMAP 329). - A 15-character string is validated by whether
15a14b13c.1o is divisible by 36 (with o as
the check digit). - The VIN system is a more complicated alphanumeric
system.
27Bar Codes
- To decode the information in a bar code, a beam
of light is passed over the bars and spaces via a
scanning device, such as a handheld wand of a
fixed-beam device. The dark bars reflect very
little light back to the scanner, whereas the
light spaces reflect much light. The differences
in reflection intensities are detected by the
scanner and converted to strings of 0s and 1s
that represent specific numbers and letters.
(COMAP 334)
28Postnet Codes
- A bar code for a ZIP4 code (1 check digit)
- There are 52 long or short bars, with one guard
bar on either side and the remaining 50 bars
grouped into 10 groups of 5, with 2 long bars and
3 short bars each. - The check digit makes the sum of all ten numbers
divisible by 10.
29http//www-math.cudenver.edu/wcherowi/jcorner/bar
codes.html
30UPC Bar Codes
- Each digit is made up of seven modules
- There are guard bars, a center division, and a
difference between manufacturer and product
numbers to make reading the codes as accurate as
possible. - Bar codes for UPCs have been in use since a pack
of Wrigley Juicy Fruit gum was scanned on June
26, 1974 in Marshs Supermarket in Troy, Ohio.
The first barcodes were made by National Cash
Register, but smearing ink problems soon made IBM
the top contender in the market.
(http//en.wikipedia.org/wiki/Barcode)
31Digit Manufacturer Product
0 0001101 1110010
1 0011001 1100110
2 0010011 1101100
3 0111101 1000010
4 0100011 1011100
5 0110001 1001110
6 0101111 1010000
7 0111011 1000100
8 0110111 1001000
9 0001011 1110100
http//en.wikipedia.org/wiki/Universal_Product_Cod
e Is the UPC above valid? (3013363512
60 0 mod 10, so yes, it is valid.)
32Illinois Drivers License Numbers
- In contrast to Social Security numbers, an
Illinois drivers license number can help
reconstruct a persons surname (by sound, not by
spelling), first and middle initials, date of
birth, and gender. - These forms of ID numbers are also used in the
National Archives, the Library of Congress, and
in genealogy research. (COMAP 341-2).
33Soundex Coding System for Surnames
- Delete all occurrences of h and w. (so Wachs
becomes acs) - Assign number as follows a,e,i,o,u,y 0
b,f,p,v 1 c,g,j,k,q,s,x,z 2 d,t 3 l 4
m,n 5 r 6 (so acs 022) - If two or more letters with the same numeric
value are adjacent, omit all but the first (so
were left with ac)
34Soundex continued
- 4. Delete the first character of the original
name if still present. - 5. Delete all occurrences of a,e,i,o,u,y. (so
were left with c) - 6. Retain only the first three digits
corresponding to the remaining letters append
trailing 0s if fewer than three letters remain
precede the three digits with the first letter of
the surname (so we have W200) - Because of the way this is coded, many errors in
spelling are taken into account. - taken directly from COMAP 342
35The Middle Digits-First Initial
- InitialCodeInitialCodeInitialCodeInitialCode
- A 0 H 320 O 640 V 860
- B 60 I 400 P 660 W 880
- C 100 J 420 Q 700 X 940
- D 160 K 500 R 720 Y 960
- E 200 L 520 S 780 Z 980
- F 240 M 540 T 800
- G 280 N 620 U 840
36The Middle Digits-Middle Initial
- InitialCodeInitialCodeInitialCodeInitialCode
- A 1 H 8 O 14 V 18
- B 2 I 9 P 15 W
19 - C 3 J 10 Q 15 X 19
- D 4 K 11 R 16 Y 19
- E 5 L 12 S 17 Z
19 - F 6 M 13 T 18
- G 7 N 14 U 18
37Calculating the Middle Digits
- Add the code for the first initial to the code
for the middle initial. For my initials, MJ, you
have 54010550. - So far, my number is W200-550.
- All middle digits information taken from
http//www.highprogrammer.com/alan/numbers/dl_us_s
hared.html
38The Last Five Digits
- In Illinois, the last five digits retain the
birth date and sex of the person. - Each month is assumed to have 31 days (starting
with January 1 as 001). My birthday, March 2, is
therefore 231622064. If male, you are done.
If female, add 600 to this number (so I am 664). - Put the last two digits of your birth year before
this number. Thus, the last five digits of my
Illinois drivers license is 8-8664. My complete
number is W200-5508-8664. As the website author
points out, IL not having overflow numbers makes
it likely to have multiple people with the same
number printed on their license. - COMAP 343
39Conclusions
- We have seen several ways of identifying objects
or people with numbers and the likelihood of
error and the ways error is caught with each one.
As COMAP points out on p. 329, Like many
practices in the real world, historical
accident and lack of knowledge about existing
methods seem to be the explanation for having so
many means of identification. We see this
especially in the difference between drivers
license numbers and SSN, which were assigned
prior to computers and the development of many of
the systems used (like Soundex).