How Unique Should a Unique Code Be?
A system I'm working on needs to generate and store unique codes that will be distributed to its users. Any user can then enter one of these codes to obtain a discount at the basket.
Because the codes are there for anybody's taking they need to be random enough that users who shouldn't have one shouldn't be able to go guessing them.
The codes will be handed out ad-hoc to users who will take the code and enter it at the basket. Nothing ties the code to any given user though.
The code I've come up with to generate the codes (read "pasted and modified off Google") is like this:
private readonly Random _rng = new Random(); private const string _chars = "ABCDEFGHJKLMNPQRSTWXY3456789"; public static string Generate(int size) { char[] buffer = new char[size]; for (int i = 0; i < size; i++) { buffer[i] = _chars[_rng.Next(_chars.Length)]; } return new string(buffer); }
It generates codes like this:
TWEXB8GE, WHJ55459, AEJA6XP5, D4J3RXJK, NRMALE8H, WMJGQAKY, YAQQKKX7
Notice that I've left out confusingly-similar characters combinations such as I and 1, Z and 2. Going further I could probably miss out S and 5 as well as 8 and B?
The issue I have is one of compromise. How do I balance uniqueness and "security" with ease of entry for the end user?
A string that's 8 characters long might not look too hard to guess, but is in fact fairly unique.
If the set of characters used has 28 members and the string is 8 characters long then the chance of guessing a code is one in 28^8 or 28 to the power or 8 or 28*28*28*28*28*28*28*28 or 377,801,998,336.
Assuming my maths is right?
Here's how some other combinations stack up:
Possible Characters | Code Length | Permutations |
28 | 12 | 232,218,265,089,212,416 |
28 | 8 |
377,801,998,336 |
28 | 6 | 481,890,304 |
28 | 4 | 614,656 |
10 | 10 |
10,000,000,000 |
10 | 3 |
1,000 |
Maybe a code 8 characters long is too much. Probably 6 would suffice. What would you opt for?
A couple of thoughts - expiration (life span) and ease of use.
Given ease of use as a factor (how hard for users to type in if they can't puzzle out copy and paste) and setting an expiration (n days on m population of codes) you could well manage with 4 characters... which is readily available in the form of a Note document ID for those writing Domino based applications. I've used this for a url clipping implementation where the short urls are not long lived and it gives me plenty of usage. I might go to 6 just to extend the life span of the urls out a bit or even 8 would do if I wanted them to live on for almost-ever.
Reply
As I understand it there's a chance the codes will sometimes be printed on paper and literally handed out to users (or even non-users as an incentive to become one). So no copy/paste there and hence I wanted to keep as short and readable as possible.
They need to last indefinitely. There's an option for them to have an expiry date but also an option not to set one.
Reply
If the codes are going to be given out then generate a complex one and produce a QR code - http://code.google.com/apis/chart/docs/gallery/qr_codes.html.
Then just implement a barcode reader (already done for Android / iPhone) for webcams ;-)
Reply
Always suspicious of auto-generated codes. Have to make sure that they don't match real words. Otherwise odd combinations involving cfku (and other variants) could crop up.
Reply
Hadn't thought of that. Although, according to the table above, the chance of any given four-letter-word cropping up is 1:614,656. That's fairly unlikely, no?
Reply
Show the rest of this thread
it depends on the value of what you are protecting.
if its bra size for victorias secret and there isnt a name or any info to id the person keep it as short as possible
if it hides a homeaddress kill it and opt for user/password+token
guids are nice but not very user friendly to type back in.
if you buy something from RIM you get a /reg.do ID=820874xxx&PD=29962xxx
so thats 18 numbers only if thats easy to retype and represent a high value
so it depends of the value of the data
Reply
i need to learn how to read you mentioned the data allready , is there a policy for code reuse ? so if there is duplicated the next one wont get a discount
Reply
Show the rest of this thread
Obviously the length of the code also depends on the amount of valid codes that you will give out, because a "guesser" need to hit any one of them.
Simply mathematically i think that if you want to keep it short then the best way is to expand your character map as much as possible. Right now you use only 28 allowed chars. But with the whole alphabeth in upper and lowercase and numbers you can get 60.
Of course people will then have problems with similar symbols but maybe this could be resolved with the choice of a proper font.
And one more thing is that if you decide to have a longer code then think about dividing it into smaller groups for better readibility: NRMA-LE8H , NR-1234-8H, NR MA LE 8H,...
But these are just ideas :-)
Reply
Good ideas though Hynek. Thanks!
Reply
My calculator gets 377,801,998,336 for 28**8.
How many of these are you going to give out? If you give out a million or them, then (even with my higher number) on average it will take only 377,802 guesses to crack one. That's not very many guesses if it is done with some computer assistance.
And there's something possibly more important than that. You probably don't want to re-use these codes, but 28*8 is only on the order of 2**26 values, so if you just generate ~8000 random codes (2**13, actually) there will be a 50% chance that you have re-used at least one. (Lookup 'birthday paradox' on wikipedia for the details.)
I would go with a longer code, and I would not make it random. I would create codes by applying a hash to a set of unique strings. This has the advantage, too, of allowing you to have customer-specific codes that can't be shared (because the hash input strings contain customer names or account numbers), or having codes that are specific to particular partner web site (by having the partner name or number in the hash input strings), or codes that are sharable and generic, all with the same format and generation mechanism.
Reply
Oops! It's not 2**26. I took the natural log on my calculator instead of the log2. It's more like 2**39. That means you can generate close to 1,000,000 codes before there's a 50% probability that you generate a dupe.
Reply
"My calculator gets 377,801,998,336" Mine too :-)
I can't imagine there ever being more than about 10,000 of these codes in existence (and that's a high-end guess).
You're losing me with all this log2 stuff. It's been a long time since I did any advanced maths. Working out it was 28^8 took me long enough...
Reply
Show the rest of this thread
One other thing you could consider if you want to prevent people guessing your codes is to add a checksum character into the code. At its simplest this can just be the character that might represent the sum of all the other characters. I'm sure you could work out the details of how that would work pretty quickly.
Reply
"I'm sure you could work out the details of how that would work pretty quickly."
Hmm, your faith in me may be misplaced. Never did get checksums.
Reply
One thing you might want to keep in mind from a usability standpoint is to keep the letters in lowercase. That way users can more easily distinguish between letters and numbers. They won't be left wondering if something is a 0 (number zero) or an O (letter O). Of course you then might run into confusion with lowercase l and the number 1. Maybe a good idea to eliminate 0s, Os, ls and 1s altogether. It decreases your pool of available codes but would probably lead to less frustration and a higher success rate.
Reply
Windows API has a CoCreateGuid(); function, which can be called from LotusScript too...
It creates 128bit integers, but you can perform a base32 conversion on it, so you will get an alphanumeric text-string.
Oh yeah, make sure your generated codes do not contain any profanity :-))
Reply
The opposite of security is usually usability - and in this case if the user is typing in the code then it has to be short-ish. IMHO less than 9 and as Hynek suggested grouped for readibility.
Case sensitivity to be avoided for good usability and likewise any similar letters/unumbers
I'm also interested in the google search you'll have to do to try to find the list of unsuitable words to parse out...could be some interesting results. Let us know how that one goes
Reply
I think just removing most of the of vowels will remove any risk of profanities popping up.
My new list of chars is:
ACDEFGHJKLMNPQRTWXY34679
If you can spell a naughty word with they you're a smarter fecker than me ;-)
Reply
Show the rest of this thread
Hi Jake, if this is Domino, why not use @Unique (without a parameter). it gives strings that are like this:
AMAG-8BAB9A
Reply
It's not Domino, but, if it were, I'm not sure @unique would cut it.
The first 4 chars are fixed and so there's "only" 308,915,776 possibilities, which I guess is enough in reality, but aren't the produced sequentially?
My code would produce, say, 100 codes at once. Assuming they are in fact guaranteed unique I'm guessing there's a chance that a user who received code AMAG-8BAB9A could then take a stab at AMAG-8BAB9B and AMAG-8BAB9C etc.
Reply
Show the rest of this thread
How about using a selection of 1000 short words. You put two words together with a digit between them. Words are easer to type because they are recognizable. This would give you about 10 million combinations I think.
Reply