Excel 2007 - Generate unique ID based on text?

45,444

Solution 1

Solution Without VBA.

Logic based on First 8 characters + number of character in a cell.

= CODE(cell) which returns Code number for first letter

= CODE(MID(cell,2,1)) returns Code number for second letter

= IFERROR(CODE(MID(cell,9,1)) If 9th character does not exist then return 0

= LEN(cell) number of character in a cell

Concatenating firs 8 codes + adding length of character on the end

If 8 character is not enough, then replicate additional codes for next characters in a string.

Final function:

=CODE(B2)&IFERROR(CODE(MID(B2,2,1)),0)&IFERROR(CODE(MID(B2,3,1)),0)&IFERROR(CODE(MID(B2,4,1)),0)&IFERROR(CODE(MID(B2,5,1)),0)&IFERROR(CODE(MID(B2,6,1)),0)&IFERROR(CODE(MID(B2,7,1)),0)&IFERROR(CODE(MID(B2,8,1)),0)&LEN(B2)

enter image description here

Solution 2

Sorry, I didn't found a solution with formula only even if this thread might help (trying to calculate the points in a scrabble game) but I didn't find a way to be sure the generated hash would be unique.

Yet, here is my solution, based on a UDF (Used-Defined Function):

Put the code in a module:

Public Function genId(ByVal sName As String) As Long
'Function to create a unique hash by summing the ascii value of each character of a given string
    Dim sLetter As String
    Dim i As Integer
    For i = 1 To Len(sName)
        genId = Asc(Mid(sName, i, 1)) * i + genId
    Next i
End Function

And call it in your worksheet like a formula:

=genId(A1)

[EDIT] Added the * i to take into account the order. It works on my unit tests

Share:
45,444
Kenny Bones
Author by

Kenny Bones

Updated on March 31, 2022

Comments

  • Kenny Bones
    Kenny Bones over 2 years

    I have a sheet with a list of names in Column B and an ID column in A. I was wondering if there is some kind of formula that can take the value in column B of that row and generate some kind of ID based on the text? Each name is also unique and is never repeated in any way.

    It would be best if I didn't have to use VBA really. But if I have to, so be it.

    • Excellll
      Excellll over 12 years
      Any requirements for the length or characters used in the ID?
  • Kenny Bones
    Kenny Bones over 12 years
    Hi! This works pretty well :) Allthough, I do get the same results for a few names, if the name has the same amount of characters. I think I'll just split the string and pick the first letter of each and then add this ID. Should probably be unique then :)
  • JMax
    JMax over 12 years
    seems like the algo is missing the order! (it will generate the same ID for james Doe and Doe james. I'll edit my answer to improve my function (FWIW, I've multiplied the id by the index so that it somehow takes the order into account. I hope that will be enough
  • JMax
    JMax over 12 years
    @chrisneilsen: why not? I understand this doesn't use any standard lib to create a hash but I wish to read in which case it wouldn't work
  • chris neilsen
    chris neilsen over 12 years
    @JMax consider this: for simple three letter words there are 52^3 = 140608 possible words. Your algo will produce a max number of 732 - clearly you can't produce unique IDs for 140000 words with <700 values! The ratio get worse the longer the words get.
  • JMax
    JMax over 12 years
    @chrisneilsen: ok got it. Thanks for taking time to answer
  • Grade 'Eh' Bacon
    Grade 'Eh' Bacon almost 9 years
    Note: I tried to do this by individually counting the number of characters from a-z in each word and placing that number (assuming 0-9) in the digit of a 10^26 number, and it would have worked if 10^26 wasn't outside of Excel's accuracy with floating point values. Shown here: =TEXT(SUM(LEN(A3)*10^{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16‌​,17,18,19,20,21,22,2‌​3,24,25,26}-LEN(SUBS‌​TITUTE(A3,MID(Alphab‌​et,{1,2,3,4,5,6,7,8,‌​9,10,11,12,13,14,15,‌​16,17,18,19,20,21,22‌​,23,24,25,26},1),"")‌​)*10^{1,2,3,4,5,6,7,‌​8,9,10,11,12,13,14,1‌​5,16,17,18,19,20,21,‌​22,23,24,25,26}),"#"‌​)
  • Grade 'Eh' Bacon
    Grade 'Eh' Bacon almost 9 years
    This is not quote unique, because the CODE of an individual character can be either 2 or 3 digits; so a combination of say 6 letters may have the same code as a combination of say 5 other letters.
  • Grade 'Eh' Bacon
    Grade 'Eh' Bacon almost 9 years
    [In the above example Alphabet is a named range containing a single string of "abcd...z"].
  • Grade 'Eh' Bacon
    Grade 'Eh' Bacon almost 9 years
    Note that this only works if there are no strings which have the same characters but in a different order. ie: 21 jump street & 12 jump street would be the same in this method.
  • milan minarovic
    milan minarovic over 8 years
    Provide such examples.
  • Grade 'Eh' Bacon
    Grade 'Eh' Bacon over 8 years
    Try converting this string of ASCII codes here back to letters; I count at least 6 ways to make proper names out of this string by flipping 1/2/3 digit characters around: 6510097109236666111983283116463280101116101 [try starting with this pattern: 232331232223222333]. Remember - the key to user inputs being calculated is always dealing with corner cases. It is the unlikely user inputs which create the most pain if your data entry is not able to handle all cases.
  • milan minarovic
    milan minarovic over 8 years
    You are talking about reverse function. Task was to assign ID to a list of Real unique names. But enhancement to my solution is very simple to avoid mixing 2 and 3 digits ASCII codes: =(1000+CODE(B2))&IFERROR(1000+CODE(MID(B2,2,1)),0)&IFERROR(1‌​000+CODE(MID(B2,3,1)‌​),0)&IFERROR(1000+CO‌​DE(MID(B2,4,1)),0)&I‌​FERROR(1000+CODE(MID‌​(B2,5,1)),0)&IFERROR‌​(1000+CODE(MID(B2,6,‌​1)),0)&IFERROR(1000+‌​CODE(MID(B2,7,1)),0)‌​&IFERROR(1000+CODE(M‌​ID(B2,8,1)),0)&LEN(B‌​2)
  • Grade 'Eh' Bacon
    Grade 'Eh' Bacon over 8 years
    This is an interesting solution - force all codes to be 4 digit numbers instead of 2 or 3. Easy to do and easy to read after because if the first two digits are 10, then it's a 2 digit code remaining, and otherwise it's a 3 digit code remaining. I like it.