Logic To Compare Names of Two People To See If Same
Is Jake Howlett the same person as Howlett Jake or Jake R Howlett?
Without some unique key, such as social security or NI number, to compare there's no way of knowing. Is there a way to take an educated guess though?
This week I had a request to produce a view that listed the names of people who had requested a free voucher along-side the name of the person who actually redeemed the voucher.
The name of the redeemers came in the form of a CSV file from the voucher suppliers and contained two columns -- one for voucher code (the key which binds the two names) and one for redeemer.
My task was simply to produce a "view" so that the company giving the vouchers out (my customer) could look for naughty people.
What they didn't ask for and what I decided to add is some way of highlighting the names that obviously don't match. I did this (see right) partly as a sweetener for the customer but also as a challenge to myself to make an otherwise dull task a little more interesting.
While what I did was in PHP and Flex I've put together a Notes version for you to play with and test the theory.
My PHP code uses the array_intersect() function to count the number of matching items from both names when split on " ". If there is more than 2 intersects then it's considered the same person. If no intersects it's in red. Only one intersect and it's a partial match (orange).
The Notes version (shown below) works in the opposite way in that it counts the number of items left when one array is replaces with the other. The more items left the less of a match it is.
The formula for the first column is:
a1:=@Explode(@LowerCase(Redeemer); " "); a2:=@Explode(@LowerCase(Requestor); " "); remaining:=@Elements(@Trim(@Replace(a1; a2; ""))); @If(remaining=0; 85; remaining=1; 86; 87)
While the formula for the second column is:
a1:=@Explode(@LowerCase(Redeemer); " "); a2:=@Explode(@LowerCase(Requestor); " "); remaining:=@Elements(@Trim(@Replace(a2; a1; ""))); @If(remaining=0; 85; remaining=1; 86; 87)
Notice the subtle difference? The accuracy of the result seems to depend on which way round your replace the lists. Exactly why that is, I can't remember. You guys?
So, the Friday Fun Challenge (should anybody accept it) is to come up with an @Formula for a column to accurately (as is reasonably possible) match two strings to see if they're the name of the same person or not.
Well of course it does matter which way round.
@Replace( sourcelist ; fromlist ; tolist )
If an entry exists in the fromlist, then it doesn't matter if it's in the source list or not. It is simply never used.
As to a solution. Well when you're messing with names it can be a real pain. Did you ask for their common names, or full names, or nick names or what? My Full name is David Richard Andrew Graham Orson Neil Cotterill. Given name is David Cotterill. Nickname is Dragon Cotterill (try abbreviating my full name!)
I suspect that what you have got is a good enough solution. Combining the results of both calculations into one value would help instant checking. If each result returns a value of 0, 1 or 2 as to how well it matches (your smiley/neutral/sad faces) then adding the two togther to get a result works.
Colour coding the lines based on your two results (ie a scale of 0-4, 0 being a complete match, 4 being nothing matches) would be an obvious choice to make the highlights easier to spot. A tag of 3 or 4 (two sad, or one neutral & one sad) is an obvious fraud. 1 or 2 is a possibility (but I certainly wouldn't worry over them) and obviously 0 is full blown match.
Doh!! I knew that as well.
Good idea to combine the two. I'll have a go at adding a third column in a mo...
Another thought just hit me. Do you strip extra characters like hyphens, commas and apostrophes?
Aaron O'Donnell vs ODonnell, A
This would show as a failure on both tests, yet it shouldn't be
Jake - have you looked at @Soundex( string ) ?
I'd be interested to see the php code too.
@Soundex is not very accurate. Having tried it lately, it sounds to me this is going to be more accurate. At least it is more understandable! Soundex results in a somewhat cryptic value.
Why not just combine what you have with:
a1:=@Explode(@LowerCase(Redeemer); " ");
a2:=@Explode(@LowerCase(Requestor); " ");
remaining1:=@Elements(@Trim(@Replace(a1; a2; "")));
remaining2:=@Elements(@Trim(@Replace(a2; a1; "")));
@If(remaining1=0 & remaining2 = 0; 85; a1=a2 ; 87 ; 86)
You could use @Keywords, which does away with the need for lots of @Trim(@Replace malarky...
reds := @LowerCase(@Explode(Redeemer; " "));
reqs := @LowerCase(@Explode(Requestor; " "));
matches := @Elements(@Keywords(n1; n2));
@If(matches=0; ":("; matches=1; ":|"; matches > 1; ":)"; ":?")