Part Numbering and ID codes: general considerations and check digits

johnhwoodsslideshare 54 views 50 slides Apr 28, 2024
Slide 1
Slide 1 of 50
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50

About This Presentation

Updated version of Part Numbering presentation: some discussion of check digit algorithms illustrated with Smalltalk snippets.


Slide Content

Part Numbers, Codes, IDs &c.
Let’s talk about how we refer to our parts
– and how it’s such an appropriate subject for
Smalltalk

[email protected] 2/50
I’m John H Woods, chronically online since 5BW (before Web).
So, first profile picture predates digital cameras: stuck my head
on a flatbed scanner. Fortunately technology has moved on!
1First entered IT as a job after learning
Smalltalk during Biosciences PhD 1998
2Have consulted widely over many
domains and much tech. since then.
3(But, I have still never met language
I didn’t think was much, much worse
than Smalltalk.)
I’m johnhwoods on Discord, Threads, The Register, gmail, hotmail, wikipedia and elsewhere
1but if to contact me specifically about this, click here: [email protected]
HELLO!

[email protected] 3/50
What makes a good part identification code?

[email protected] 4/50
What makes a good part identification code?
1Unique in our organization

[email protected] 5/50
What makes a good part identification code?
1Unique in our organization
2Unambiguous for us, our partners, clients and suppliers

[email protected] 6/50
What makes a good part identification code?
1Unique in our organization
2Unambiguous for us, our partners, clients and suppliers
3Simple, consistent logical structure

[email protected] 7/50
What makes a good part identification code?
1Unique in our organization
2Unambiguous for us, our partners, clients and suppliers
3Simple, consistent logical structure
So far, so uncontroversial …

[email protected] 8/50
What makes a good part identification code?
1Unique in our organization
2Unambiguous for us, our partners, clients and suppliers
3Simple, consistent logical structure
aDecimal only; I will die on this hill
bVariable width (different hill, but same)
cNo leading zeros! And perhaps more restrictions …
dA single, transparent, punctuation mark to structure the digits

[email protected] 9/50
What makes a good part identification code?
1Unique in our organization
2Unambiguous for us, our partners, clients and suppliers
3Simple, consistent logical structure
aDecimal only; I will die on this hill
bVariable width (different hill, but same)
cNo leading zeros! And perhaps more restrictions …
dA single, transparent, punctuation mark to structure the digits
4ABOVE ALL: Use a check digit! See you on Hill 3!

[email protected] 10/50
Common and good advice rejected
1Grouping codes by compatibility categories
aembeds assembly details: a future-proofing nightmare
2Using constant width fields
acontradicts meaningful structure (more later …)
bimpacts scalability and future-proofing: exhaust a short
number range – or end up with inconveniently long ones
ccommonly used part codes as lengthy as rare ones
3Using meaningful alphanumeric codes
aembeds too much info, increases length and complexity
bambiguity and locale issues with letters and acronyms
cimpacts future-proofing as parts are replaced or retired

[email protected] 11/50
Hill 1: Only Use Decimal
1Better internationalization (‘i18n’)
aunderstood even in CJK and RTL languages
bavoids confusion regarding meaning of letters and letter sequences
2Reduction of errors
aLess ambiguity when written (O-D-0-o, 8-B, I-l-1, S-5, G-6-b, 2-Z-7, 9-q)
bLess ambiguity in spoken English (“oh” O-o-0, 8-A-H, S-F, M-N &c.)
cEasy character set (or … is it Hex, Base 36, 64, or other? CaSe SeNsItIvE?)
3Better compatibility with other data systems
aNearly everything accepts integers (31 bits gives over 2 billion numbers)
bLess comparison ambiguity (all part numbers are different unique integers)
4Only numeric keys required: speed of entry / simplicity of devices
5Improved choice of check-digit algorithms (more later …)

[email protected] 12/50
Surely you can’t be serious?
1I am serious and don’t call me Shirley.
2But decimal numbers are less compact than alphanumeric?
aNot much: 20% longer than Hex, 50% longer than base 36
bJust 10 digits is already literally billions of part numbers
3But they’re harder to remember / don’t have meaning!
aThat’s why we have computer systems… and a description column!
bPeople using them frequently will remember shorter codes, people
who don’t won’t need to, and will perform simple lookups as required
cGrouping can give numbers meaning. A woodworker, realizing all
planed timber is nnn-11-, will know what a nnn-11-2400-47-100 is;
a mechanic will be able to guess what a nnn-19-225-45 might be; and
a librarian is unlikely to have to look up 903.

[email protected] 13/50
Hill 2: Variable width
1Allows for meaningful structure and therefore changes
aVersion 2 of a nnn-101-234 can be nnn-101-234-2
bThis is especially powerful with optional punctuation (next slide…)
2Allows for shorter codes for commonly used parts
aWith a check digit (see later…) there are ten two digit, a hundred
three digit, and a thousand four digit codes: 1110 short numbers
bAvoids short codes like 31 having meaningless leading zeros
(e.g. 00000031) or other prefixes (e.g. ‘30000031’)
3The advantages of fixed width are now limited
aSort order is unimportant – and antithetical to structure
bError checking by check digit far superior to digit counting
cOutput formatting is less burdensome on modern systems

[email protected] 14/50
thebenefitsofseparationwithpunctuation
1Grouping with some form of punctuation
aimproves readability with more than 5 or 6 digits (e.g. ‘the long card number’)
bcan improve familiarization, allowing memorization or even meaning
ccan allow the use of numbers that are already recognized, like tyre sizes
2However
aspaces can wrap and other data systems can do weird things with them
balmost every other symbol has a meaning already, including commas, periods
and both slashes, and the ‘vertical bar’ | can look too much like 1, l or I.
c_ isn’t on a numeric keypad, generally needs <SHIFT>, and has a long name
3The hyphen (-) is a good compromise (usually on the top corner of
a numeric keypad) and has a nice short name (‘dash’) in English.
NB Punctuation must be transparent so ‘101-234-2’ is 1012342
Integers transfer unambiguously between systems

[email protected] 15/50
Zero: The trouble with l'œuf
1Leading zeros
aoften get mangled by other systems
bdestroy the valuable 1:1 relationship to integers
care not required when we’ve already abandoned constant width
dare invisible to many check digit algorithms (usually by design)
2And, all zeros
acan cause some Damm issues (this algorithm discussed later …)
bcan appear dotted/slashed, 0, sometimes confused with 8
3But, banning all zeros reduces
athe number of items representable with a given number of digits
bability to use numbers with real meanings, such as 2400 for 2.4m

[email protected] 16/50
Hill 3: Check your digits – with Check Digits!
1When humans enter or communicate data, they make errors:
aLetter / digit errors: 0 ⬌ O or I ⬌ 1 (N/A in decimal only codes)
bSingle digit errors: 8 ⬌ 9 (and more keys are adjacent on a numpad)
cSimple transposition error: 98 ⬌ 89
dTwin error: 88 ⬌ 99
eJump transposition error: 317 ⬌ 713
fJump twin error: 313 ⬌ 414
gPhonetic: 60 ⬌ 16 (“sixty” ⬌ “sixteen”)
2Simpler algorithms: easier and faster, but detect fewer errors
3Let’s see how to get the maximum error detection when using
a single digit checksum …

[email protected] 17/50
Modulo 10 Sums
1We could just sum all the digits modulo 10:
12345 = 1+2+3+4+5 = 15; divide by 10 is 1 remainder 5
Our new number would be 123455 (or 12345-5 etc.)
We can detect all single digit errors:
12445-5 = 1+2+4+4+5 = 16, modulo 10 is 6
but check digit is 5 and therefore so we know the number is invalid
This approach similarly detects insertions and deletions
– but not (most) transpositions, as the sum of the digits is unchanged
2Smalltalk
('12345'
inject: 0 "start value; sum is accumulated"
into: [ :sum :char | sum + char digitValue ]) \\ 10

[email protected] 18/50
Weighted Sum, Modulo 11
1To detect transpositions, we ‘weight’ the digits:
The ISBN-10 of the Blue Book is 0-201-11371-6
ISBN-10 uses weights of the digit position:
0×1 + 2×2 + 0×3 + 1×4 + 1×5 + 1×6 + 3×7 + 7×8 +1×9 + 6×10 = 165
165 modulo 11 is 0 and that means that the number is valid.
The rightmost digit is a check digit whose value is adjusted to
ensure the weighted sum of the digits is an exact multiple of 11
2UK NHS Numbers use the same technique.
3Note that the last digit might need a value of 10! So:
aIn the NHS, the next available number is used
bIn ISBN-10, the last digit is an X, e.g. 0-14-118260-X
(Who loves a Roman numerals joke? I, for one!)

[email protected] 19/50
Weighted Sum, Modulo 11 in Smalltalk (1/2)
"WARNING: There is no error checking in any of these snippets.
Initial checks must be done by Regex or manual format checking."
| weight |
weight:= 0. "left to right, weights are 1,2,3, ..."
('6088489006' inject: 0
into: [ :checksum :char |
(weight:= weight+1) * char digitValue + checksum ]) \\ 11
"or, with an additional method, the weight is the index (In
Smalltalk, arrays, strings &c. start at 1 – as nature intended!)"
('608848900' withIndexInject: 0
into: [ :checksum :char :idx |
idx * char digitValue + checksum ]) \\ 11
"The former returns 0 indicating validity. The latter, with a 9
character input, returns 6, which is the value of the check digit
needed to make the number valid, as in the former case.

[email protected] 20/50
"NB: the checksum can be 10, e.g. some editions of Strunk & White..."
('020530902' withIndexInject: 0
into: [ :checksum :char :idx |
idx * char digitValue + checksum ]) \\ 11
"Returns 10, which is why the original ISBN was 0-205-30902-X.
The UK NHS addresses this problem by simply not using any numbers
which would require a check “digit” character outside the range 0-9"
"But, for ISBN-10, we need to handle the occasional terminal X"
| isbn |
isbn:= '020530902X'.
(isbn allButLast withIndexInject: 0
into: [ :checksum :char :idx |
idx * char digitValue + checksum ]) \\ 11
= (isbn last = $X ifTrue: [ 10 ] ifFalse: [ isbn last digitValue ])
"Generating ISBN-10s is left as an exercise for the reader"
Weighted Sum, Modulo 11 in Smalltalk (2/2)

[email protected] 21/50
ISBN-13 is EAN-13
1ISBN numbers migrated from 10 to 13 digits in 2007,
for compatibility with EAN (European Article Number),
now internationally known as GTIN (Global Trade Item
Number).
aThe symbology includes only digits 0-9
therefore a modulo 10 algorithm is used.
bCounting from right to left, odd positions
are weighted 1, even positions weighted 3.
2ISBN-10s are converted by prefixing 978 to the payload
and appending the new EAN-13 check digit.
3+2×3+0+9×3+0+3×3+5+0×3+2+0×3+8+7×3+9 = 90 ✓

[email protected] 22/50
EAN/GTIN in Smalltalk (1/2)
"reversed makes no difference in the GTIN-13 case as it has an odd number of
digits, but allows GTIN-8 and GTIN-18 with no need to check input parity"
"Note the index flip when moving from validation to generation"
| lastWeight |
lastWeight:= 3. "right to left, weights are 1,3,1,3 ..."
('1234512745657' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum ]) \\ 10
"Returns 0, which indicates the number is correct"
"To generate a check digit, run the algorithm on the `payload`
of the first 12 digits and subtract the result from 10"
| lastWeight |
lastWeight:= 1. "right to left, weights are 3,1,3,1 ..."
(10 - ('123451274565' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum ]) \\ 10) \\ 10
"Returns 7 as expected from the above. If this returns 10,
the check digit is 0, hence the additional \\ 10 at the end"
"NB: without the check digit, our first digit is even, not odd."

[email protected] 23/50
EAN/GTIN in Smalltalk (2/2)
"EAN transposition check shows its most important weakness"
| lastWeight |
lastWeight:= 3.
'1235412745657' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum \\ 10 ]
"Returns 2, which proves the number is incorrect"
1"But, if the transposed digits differ by 5, the error goes undetected:
x + 3(x+5) = 4x+15 and 3x + (x+5) = 4x+5, which are congruent modulo 10"
| lastWeight |
lastWeight:= 3.
'1234517245657' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum \\ 10 ]
"Returns 0 indicating validity. However, such transpositions are less
frequent: digits different by 5 are not adjacent on either the number row or
standard numeric keypads (whether calculator style or telephone style)"

[email protected] 24/50
Luhn
1Patented in 1960 but now in the public domain, Luhn is
used for: credit card numbers; IMEIs (mobile handsets);
European Patent Application numbers; ICCIDs (SIM cards);
several government IDs from round the world; &c.
2Another “sum digits with alternating weights modulo 10”
method; the weights, again starting from the right,
alternate 1 and 2, rather than EAN/GTIN’s 1 and 3.
3However, where doubling a digit gives a two digit result,
we use the sum of the digits of the answer.
4So, when in even digit positions, 0,1,2,3,4 yield 0,2,4,6,8
– but 5,6,7,8,9 yield 1,3,5,7,9 – not 10,12,14,16,18.

[email protected] 25/50
Luhn Example
1To validate: 8905543677
2Going right to left every other digit we get
7+14+6+6+4+10+5+0+9+16
3Replacing all two digit results with their digit sum yields
7+5+6+6+4+1+5+0+9+7 = 50 mod 10 = 0.
4As with GTIN, if the sum is a multiple of 10, the number is
valid; and, as with GTIN, this is effectively the same operation as
dropping the check digit, calculating a check digit from the
payload, and comparing it to the existing check digit.
5As with GTIN, to generate a check digit, we must flip the weights
as the now missing check digit was digit 1, and we start by
doubling the first digit as we go right to left.

[email protected] 26/50
Luhn Smalltalk (1/5) –digits and characters
"How to sum digits of a two digit number?"
14 printString fold: [:a :b | a digitValue + b digitValue]
"Or just add the tens and the units"
(0 to: 9) collect: [:n | | d |
d:= n*2.
d // 10 + d \\10 ]
"#(0 2 4 6 8 1 3 5 7 9)"
"Or, once we have that result, just use it as a lookup table"
(0 to: 9) collect: [:n | #(0 2 4 6 8 1 3 5 7 9) at: 1+n"
"Also, by using digitValue the argument can be any Unicode characters that
represent the decimal digits, e.g. Sinhala"
'෦෧෨෩෪෫෬෭෮෯ ' asArray collect: [ :char | char digitValue ]
"#(0 1 2 3 4 5 6 7 8 9)"
NB: We did all the input sanitization before, right? Right?"

[email protected] 27/50
Luhn Smalltalk (2/5) – using a lookup table
"This uses alternating offsets of 11 and 1 with a lookup table. The validation
code, highlighted, is only four lines. This Snippet will return any invalid
numbers in the array. All the inputs in the example below are valid."
| luhn |
luhn:= #[ 0 1 2 3 4 5 6 7 8 9
0 2 4 6 8 1 3 5 7 9 ].
#(18 #[3 7 2] '٥٦٤٨٧٢' "formats: integer, byte array, unicode string"
8905543677 "my example from earlier slide"
17893729974 "the example from Wikipedia"
1234567812345670 "the example from Rosetta Code"
'3566002020360505' "an example Credit Card number, spaces removed"
339269927028426 "an example IMEI number"
) reject: [ :input | | index |
index:=11.
(input asString reversed inject:0 into:[:sum :digit || codePoint |
codePoint:= digit asInteger & 15.
sum + (luhn at:(index:= 12-index)+codePoint) ])\\10=0]

[email protected] 28/50
Luhn Smalltalk (3/5) – tables beyond 0 to 9
"Unlike the other single digit check digit algorithms here, Luhn can be
generalized to larger (or smaller) character sets with an even number of
symbols. Lookup table generation translates to Smalltalk as follows:"
| charset base luhn |
charset:= '0123456789ABCDEFGHJK'. "works with any symbols, in any order"
base:= charset size.
(base\\2) isZero
ifFalse: [ self error:'Luhn character set must have even size' ].
luhn:= OrderedCollection new.
luhn addAll: (0 to:base-1).
0 to: base-1 do: [:digit || double |
double:= 2 * digit.
luhn add: double//base + double\\base ].
luhn asByteArray
"#[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 2 4 6 8 10 12 14 16 18 1 3 5 7 9 11 13 15 17 19 ]"
e.g.: 14 ×2 = 28
which is 18 in base 20
so digit sum is 9
Note: a nice
easy pattern
is emerging

[email protected] 29/50
Luhn Smalltalk (4/5) – validation modulo N
"Note table construction and validation code. Snippet takes an array and
returns an array of those elements which are invalid. Lookup table could be
cached but typically construction is 2-5µs, with each check 1-2µs "
| base charset luhn |
base:= (charset := 'abcdef') size.
luhn:= ((0 to:base-1),(0 to:base-2 by:2),(1 to:base by:2)) asByteArray.
#( 'abcdefe' 'abcb' 'defc' )
reject:[:input || index |
index:= 1+base.
(input asString reversed
inject: 0
into: [:sum :digit || codePoint |
codePoint:= (charset indexOf: digit)-1.
sum + (luhn at: (index:= 2+base-index) + codePoint) ])\\base=0]
"NB: In design, ensure the charset size is even, with no repetition"
"NB: In execution, ensure that each argument, after removing allowed
punctuation, comprises only characters from the character set"

[email protected] 30/50
Luhn Smalltalk (5/5) – generation modulo N
"Because (m-(x mod m) mod m) simplifies to (-x) mod m, the generation
method (again note that the index is flipped) is as follows ..."
| base charset luhn |
base:= (charset := ' ✓ ') size.
??????☀ ⚙??????⚡
luhn:= ((0 to:base-1),(0 to:base-2 by:2),(1 to:base by:2)) asByteArray.
#( ' ✓ ' ' ✓' ' ' )
??????☀ ⚙??????⚡ ??????☀ ⚙??????⚡
collect:[:input || index |
index:= 1.
input asString, (charset at: 1 + ((input asString reversed
inject: 0
into: [:sum :digit || codePoint |
codePoint:= (charset indexOf: digit)-1.
sum + (luhn at: (index:= 2+base-index) + codePoint) ])
negated \\ base)) asString ]
"#(' ✓ ' ' ✓ ' ' ✓')"
??????☀ ⚙??????⚡?????? ??????☀ ☀ ⚙??????⚡
"Check digits Pile of Poo Sun With Rays and Check Mark are appended"
very little code
is required to
generate
Luhn mod N
check digits

[email protected] 31/50
Luhn Weaknesses
1Twin errors:
22 2 + 4
??????
6
?????? ??????
5 + 1 5 + 10
?????? ??????
55
33 3 + 6
??????
9
?????? ??????
6 + 3 6 + 12
?????? ??????
66
44 4 + 8
??????
12
?????? ??????
7 + 5 7 + 14
?????? ??????
77
Unfortunately all three of these errors comprise adjacent keys
on a numeric keypad; less likely to be typed on a number row
2Transposition error:
90 9 + 0
??????
9
?????? ??????
0 + 9 0 + 18
?????? ??????
09
Not only adjacent on number row, but at the extreme right,
both conventionally operated by digitus minimus dexter
1
1
right pinky finger

[email protected] 32/50
Luhn Conclusions
1Works with any symbol set
athere must be an even number of symbols
–no problem for octal, decimal, hexadecimal and the Latin alphabet
bbut symbol codes do not need to be contiguous or in any particular order
2Allows codes as short as 2 symbols, and up to arbitrary length.
3Easy to implement, especially for the standard case of digits 0..9
4Fast: typically lookup table construction 2-5µs; then validation < 1µs
5Widely used, well-understood and still extremely useful
6Has some unfortunate weaknesses
a60% of twin errors undetected, and easy to key on a numpad
bcannot detect transpositions of the end digits in its character set
–and this is particularly relevant with the 0..9 case

[email protected] 33/50
Half Time!
1Smalltalk is great! Any dissent to be handled later – in the car park ??????
2Some codes have double digit checksums (checksa?)
ae.g. International banking codes: IBAN uses a modulo 97 algorithm
bpossibly overkill for part numbering, we’ll stick to single digits
3Despite the appeal of Luhn Mod N, decimal codes remain optimal
abut weaknesses of Luhn 0..9 make it subobtimal for part numbering
band countering the weakness by manually preventing the creation of
numbers with 09 or 90, or by using the digits out of order, e.g.
1234567890, would result in non-standard implementations and
possibly maintainability issues
4Advantages over later algorithms turn out to be underwhelming:
aIt’s not that much faster
bIt’s not that much simpler

[email protected] 34/50
Verhoeff
1In 1969 Jacobus Verhoeff presented the first decimal check digit algorithm
which detects all adjacent transposition errors and all single digit errors,
despite this having been thought (even mistakenly ‘proved’) impossible.
2From some eye-watering maths, he created three tables: p, d, and inv.
[Wikipedia]

[email protected] 35/50
Navigating the Verhoeff Tables: Validation
1Starting with, e.g., ‘3572’ and c = 0, we go L R
??????
from position 0 (the check digit)
2The permutation table, p, gives us p(0,2) (0 is position modulo 8, 2 is the digit) = 2
3The 2 goes into the multiplicative table, d, with the current c, giving c = d(0,2) = 2
4Moving left: p(1,7) = 0; c=d(2,0) = 2
5Moving left: p(2,5) = 9; c=d(2,9) = 6
6Moving left: p(3,3) = 6; c=d(6,6) = 0
7We have ended on c = 0
8We look up 0 in the inverse table, inv, and get 0
9The number 3572 is therefore valid ✓
What is the point of step 8? The only 0 in inv maps to 0.
Surely we could just ignore the inverse table for validation?
We can, but we won’t – and the generation method tells us why…

[email protected] 36/50
Navigating the Verhoeff Tables: Generation
1Starting with, e.g., ‘3570’ and c = 0, we go L R
??????
from position 0 (the check digit)
2The permutation table, p, gives us p(0,0) (0 is position modulo 8, 0 is the digit) = 0
3The 0 goes into the multiplicative table, d, with the current c, giving c = d(0,0) = 0
4Moving left: p(1,7) = 0; c=d(0,0) = 0
5Moving left: p(2,5) = 9; c=d(0,9) = 9
6Moving left: p(3,3) = 6; c=d(9,6) = 3
7We have ended on c = 3
8We look up 3 in the inverse table, inv, and get 2
9The number 357 therefore needs the check digit 2 to become 3572 as previous slide
So, to generate a check digit we suffix a zero
1
to our input and
use the same method – and the result is our new check digit.
1
This is equivalent to multiplying the equivalent integer by 10.

[email protected] 37/50
Verhoeff in Smalltalk (1/3) - lookups
Recalling Arrays start at 1, we have the following lookup methods:
privateInvTableAt: j
^ #[ 0 4 3 2 1 5 6 7 8 9 ] at: j+1
privatePtableRowAt: r column: c
^ ( #( #[ 0 1 2 3 4 5 6 7 8 9 ]
#[ 1 5 7 6 2 8 3 0 9 4 ]
#[ 5 8 0 3 7 9 6 1 4 2 ]
#[ 8 9 1 6 0 4 3 5 2 7 ]
#[ 9 4 5 3 1 2 6 8 7 0 ]
#[ 4 2 8 6 5 7 3 9 0 1 ]
#[ 2 7 9 3 8 0 6 4 1 5 ]
#[ 7 0 4 6 9 1 3 2 5 8 ] )
at: r+1)
at: c+1
privateDtableRowAt: r column: c
"as above but with d(j,k) table”
allowedCharacters
^Array new
,'0123456789' "Western Arabic”
,'٠١٢٣٤٥٦٧٨٩' "Eastern Arabic”
,'०१२३४५६७८९' "Devanagari"
,'০১২৩৪৫৬৭৮৯' "Bengali"
,'௦௧௨௩௪௫௬௭௮௯ '"Tamil"
,'๐๑๒๓๔๕๖๗๘๙' "Thai"
,'၀၁၂၃၄၅၆၇၈၉' "Myanmar"
,'෦෧෨෩෪෫෬෭෮෯ '"Sinhala"
allowedSeparators
^Array new
with: $-
with: Character space

[email protected] 38/50
Verhoeff in Smalltalk (2/4) - API
Private method,
privateVerhoeff: digits
"NB: private – no error checking
argument is any SequenceableCollection
of integers in (0 to:9); returns a
single Integer in that range."
| pos c |
pos := c := 0.
digits reverseDo: [:n | | p |
p := self pTableAtRow: pos col: n.
c := self dTableAtRow: c col: p.
pos := pos + 1 \\ 8 ].
^ self invTableAt: c
Private
VerhoeffString: aString
"Takes unpunctuated string of Unicode
with digitValue (0 to:9), returns a
single Integer in that range."
aString isString
ifFalse: [ self error: '...' ].
aString size > 1
ifFalse: [ self error: '...’ ].
aString reject: [ :char |
self allowedCharacters
includes:char]) isEmpty
ifFalse: ['...’ ].
aString first digitValue > 0
ifFalse: [self error: '...' ].
^ self privateVerhoeff:(aString asArray
collect: [ :char | char digitValue ])
Public method,

[email protected] 39/50
Verhoeff in Smalltalk (3/3) – in one block
"Verhoeff algorithm; argument is [0-9][0-9]+, answer is integer in 0..9"
| c pos |
pos:= c:= 0. "checksum and position (right to left) start at zero"
3572 asString reversed do: [ :digit |
c := (#( #( 0 1 2 3 4 5 6 7 8 9 ) #( 1 2 3 4 0 6 7 8 9 5 )
#( 2 3 4 0 1 7 8 9 5 6 ) #( 3 4 0 1 2 8 9 5 6 7 )
#( 4 0 1 2 3 9 5 6 7 8 ) #( 5 9 8 7 6 0 4 3 2 1 )
#( 6 5 9 8 7 1 0 4 3 2 ) #( 7 6 5 9 8 2 1 0 4 3 )
#( 8 7 6 5 9 3 2 1 0 4 ) #( 9 8 7 6 5 4 3 2 1 0 ) ) at:1+c)
at:1+((#( #( 0 1 2 3 4 5 6 7 8 9 ) #( 1 5 7 6 2 8 3 0 9 4 )
#( 5 8 0 3 7 9 6 1 4 2 ) #( 8 9 1 6 0 4 3 5 2 7 )
#( 9 4 5 3 1 2 6 8 7 0 ) #( 4 2 8 6 5 7 3 9 0 1 )
#( 2 7 9 3 8 0 6 4 1 5 ) #( 7 0 4 6 9 1 3 2 5 8 ) )
at:1+pos) at: digit digitValue + 1).
pos:= pos + 1 \\ 8 ].
#( 0 4 3 2 1 5 6 7 8 9 ) at:1+c
"NB: No error checking – SANITIZE INPUTS! Integers must be >9;
strings (or arrays &c.) MUST comprise only 0..9 (or $0..$9 &c.)"

[email protected] 40/50
Verhoeff Weaknesses?
1Verhoeff only works with decimal codes, but:
abut we already rejected non-decimal codes.
2The algorithm is complex, but:
athe implementation is just table lookups: 2 per digit and 1 more
bthe operations involved, unicode lookup, table lookup &c. are simple
cit’s pretty fast, ~2µs, and comparable to Luhn
3Unlike most checksum algorithms, it is sensitive to leading zeros, but:
athat’s actually useful for those who do use leading zeros
bhopefully I’ve persuaded you that isn’t going to be us!
4There may be other weaknesses, so:
afind someone with a better understanding of dihedral groups
bdraw statistical inferences from experimentation (later… )

[email protected] 41/50
Damm
1In 2004 Michael Damm published this algorithm in his PhD on
Totally Antisymmetric Quasigroups … Yes, me neither!
2Again the underlying maths is complex but translates to table lookups,
even easier ones than Verhoeff: just one table to iterate and no inverse.
3Start with 0, end with 0:
[Wikipedia]

[email protected] 42/50
Navigating the Damm Table
1Starting with, e.g., ‘64353’ and r = 0, we go L R
??????
(for a change!)
2column 6 row 0 8
??????
3column 4 row 8 6
??????
4column 3 row 6 9
??????
5column 5 row 9 3
??????
6column 3 row 3
??????
0
7We have ended on r = 0
8There’s no inverse or final lookup table, once we hit 0, it’s valid
9The input ‘64353’ is therefore valid ✓
How do we manage without an inverse table? Because the 10x10 Latin Square we are
using for each iteration has a zero diagonal, when row r = column c, then r,c 0.
??????
Consequently, the non-zero value returned on failed validation is also the check digit
that should be appended on generation. So, as with Verhoeff, just one method for
both validation and generation.

[email protected] 43/50
Damm Smalltalk (1/2)
"Damm algorithm; argument is [0-9][0-9]+, answer is integer in 0..9"
57245724 asString inject: 0 into: [ :row :col |
#[ 0 3 1 7 5 9 8 6 4 2
7 0 9 2 1 5 4 8 6 3
4 2 0 6 8 7 1 3 5 9
1 7 5 0 9 8 3 4 2 6
6 1 2 3 0 4 5 9 7 8
3 6 7 4 2 0 9 5 8 1
5 8 6 9 7 2 0 1 3 4
8 9 4 5 3 6 2 0 1 7
9 4 3 8 6 1 7 2 0 5
2 5 8 1 4 3 6 7 9 0 ] at: 10 * row + (col digitValue) + 1 ]
"NB: No error checking – SANITIZE INPUTS! Integers must be >9;
strings (or arrays &c.) MUST comprise only 0..9 (or $0..$9 &c.)"

[email protected] 44/50
Damm Smalltalk (2/2)
"Could use the same digitValue approach to store the table as a String"
'57245724' asString inject: 0 into: [ :row :col |
('03175986427092154863420687135917509834266123045978',
'36742095815869720134894536201794386172052581436790'
at: (10 * row + 1 + (col digitValue))) digitValue]
"Naturally we could store the table as a constant, but this hardly needs optimising."
| reader |
reader := testInput readStream.
[ reader next asString inject: 0 into: [ :row :col |
('03175986427092154863420687135917509834266123045978',
'36742095815869720134894536201794386172052581436790'
at: (10 * row + 1 + (col digitValue))) digitValue] ] bench2.
"'O.956 picoFortnights (1.156µs)'"
"It’s clear the Damm algorithm is very simple to code, and could
easily be done client-side (e.g. in Javascript) where required."

[email protected] 45/50
Damm Weaknesses?
1Damm, like Verhoef, only works with decimal codes, but:
abut we already rejected non-decimal codes
2Like Verhoeff, the algorithm is complex, but:
athe implementation is just table lookups: 1 per digit.
bthe operations involved, unicode lookup, table lookup &c. are simple
cit’s pretty fast, ~2µs, and comparable to Luhn and Verhoeff
3Like most checksum algorithms (but unlike Verhoeff) the algorithm
is insensitive to leading zeros, but:
awho wants to use leading zeros anyway?
4There may be other weaknesses, so:
ahow are you with totally antisymmetric quasigroups?
bdraw statistical inferences from experimentation (later… )

[email protected] 46/50
Zero Redux: Un œuf is un œuf? (1/3)
1Unlike the Verhoeff algorithm, the position of digits
(their index) is not used in the calculation
2To calculate a check digit with Verhoeff, you process the
number with a 0 appended (i.e. the number × 10) and
the result is the check digit of the original number
3But in Damm, the check digit is simply obtained by
applying the algorithm to the number itself
4So in both Damm and Verhoeff a valid number returns 0
aBut in Damm that is also the next check digit
bso Damm is insensitive to trailing zeros

[email protected] 47/50
Zero Redux: More than un œuf? (2/3)
1For example, the check digit of ‘1’ is 3, giving 13, the
smallest “Damm number.” Running Damm on 13 yields
0, which, as with Verhoeff, indicates a correct checksum
2But therefore 130 also has a correct Damm checksum,
as do 1300 and 13000 &c.
3One possible defence is to simply ban trailing zero
aUsing a dash to highlight the check digit, Part 14 would become
Part 14-9 but Part 21 would become part 21-0.
bThis is not ideal if Part 2 has already become Part 2-1; so we
would just avoid having a payload (the “unchecked” raw
number) of 21.

[email protected] 48/50
Zero Redux: You had un œuf yet? (3/3)
1… but …
2Because the “interim digit” (r) in the Damm algorithm
starts with a value of 0, any Damm number followed by
another will also be a Damm number
3So 13, 1313, 13013, 130013, 1300013 all checksum
4So, Damm is “zero blind” anywhere before or after a
Damm number, including between them:
5So in, e.g., ...13…515…37…6876… each ‘…’ could be
any number of zeros (including zero of them) without
the checksum failing.

[email protected] 49/50
Defense (1/2)
1Go back to fixed width. But:
aI already chose this as a hill to die on
bIt makes commonly used part numbers long
cIt requires otherwise meaningful numbers to be padded
dIt makes meaningful number changes hard to manage:
e.g. part nnn-14- can be replaced with part nnn-14-2-
2Never mind leading or trailing, ban the zero! But:
areduces the pool of part numbers
bmay also impair choosing meaningful part numbers
–The strategy would ban numbers such as 2400 for no reason.
3Live with it: no check-digit algorithm detects all errors
Defence against the Damm Zero

[email protected] 50/50
tl:dr;
1Already have a check digit? Use that.
2Already using EAN/GTIN article numbers? Use those.
3Need alphanumerics, other symbols, or character sets
larger (or smaller) than decimal digits? Use Luhn.
4Do you need maximum error resistance? Use Verhoeff.
5Do you want effective but really simple? Use Damm
In a further presentation I will provide some statistical
analysis, some potential refinements, and my final
recommendations. “I see you shiver with anticip …”