SQL: parse the first, middle and last name from a fullname field

147,481

Solution 1

Here is a self-contained example, with easily manipulated test data.

With this example, if you have a name with more than three parts, then all the "extra" stuff will get put in the LAST_NAME field. An exception is made for specific strings that are identified as "titles", such as "DR", "MRS", and "MR".

If the middle name is missing, then you just get FIRST_NAME and LAST_NAME (MIDDLE_NAME will be NULL).

You could smash it into a giant nested blob of SUBSTRINGs, but readability is hard enough as it is when you do this in SQL.

Edit-- Handle the following special cases:

1 - The NAME field is NULL

2 - The NAME field contains leading / trailing spaces

3 - The NAME field has > 1 consecutive space within the name

4 - The NAME field contains ONLY the first name

5 - Include the original full name in the final output as a separate column, for readability

6 - Handle a specific list of prefixes as a separate "title" column

SELECT
  FIRST_NAME.ORIGINAL_INPUT_DATA
 ,FIRST_NAME.TITLE
 ,FIRST_NAME.FIRST_NAME
 ,CASE WHEN 0 = CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
       THEN NULL  --no more spaces?  assume rest is the last name
       ELSE SUBSTRING(
                       FIRST_NAME.REST_OF_NAME
                      ,1
                      ,CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)-1
                     )
       END AS MIDDLE_NAME
 ,SUBSTRING(
             FIRST_NAME.REST_OF_NAME
            ,1 + CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
            ,LEN(FIRST_NAME.REST_OF_NAME)
           ) AS LAST_NAME
FROM
  (  
  SELECT
    TITLE.TITLE
   ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)
         THEN TITLE.REST_OF_NAME --No space? return the whole thing
         ELSE SUBSTRING(
                         TITLE.REST_OF_NAME
                        ,1
                        ,CHARINDEX(' ',TITLE.REST_OF_NAME)-1
                       )
    END AS FIRST_NAME
   ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)  
         THEN NULL  --no spaces @ all?  then 1st name is all we have
         ELSE SUBSTRING(
                         TITLE.REST_OF_NAME
                        ,CHARINDEX(' ',TITLE.REST_OF_NAME)+1
                        ,LEN(TITLE.REST_OF_NAME)
                       )
    END AS REST_OF_NAME
   ,TITLE.ORIGINAL_INPUT_DATA
  FROM
    (   
    SELECT
      --if the first three characters are in this list,
      --then pull it as a "title".  otherwise return NULL for title.
      CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
           THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,1,3)))
           ELSE NULL
           END AS TITLE
      --if you change the list, don't forget to change it here, too.
      --so much for the DRY prinicple...
     ,CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
           THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,4,LEN(TEST_DATA.FULL_NAME))))
           ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME))
           END AS REST_OF_NAME
     ,TEST_DATA.ORIGINAL_INPUT_DATA
    FROM
      (
      SELECT
        --trim leading & trailing spaces before trying to process
        --disallow extra spaces *within* the name
        REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)),'  ',' '),'  ',' ') AS FULL_NAME
       ,FULL_NAME AS ORIGINAL_INPUT_DATA
      FROM
        (
        --if you use this, then replace the following
        --block with your actual table
              SELECT 'GEORGE W BUSH' AS FULL_NAME
        UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME
        UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME
        UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME
        UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME
        UNION SELECT 'TOMMY' AS FULL_NAME
        UNION SELECT 'BILLY' AS FULL_NAME
        UNION SELECT NULL AS FULL_NAME
        UNION SELECT ' ' AS FULL_NAME
        UNION SELECT '    JOHN  JACOB     SMITH' AS FULL_NAME
        UNION SELECT ' DR  SANJAY       GUPTA' AS FULL_NAME
        UNION SELECT 'DR JOHN S HOPKINS' AS FULL_NAME
        UNION SELECT ' MRS  SUSAN ADAMS' AS FULL_NAME
        UNION SELECT ' MS AUGUSTA  ADA   KING ' AS FULL_NAME      
        ) RAW_DATA
      ) TEST_DATA
    ) TITLE
  ) FIRST_NAME

Solution 2

It's difficult to answer without knowing how the "full name" is formatted.

It could be "Last Name, First Name Middle Name" or "First Name Middle Name Last Name", etc.

Basically you'll have to use the SUBSTRING function

SUBSTRING ( expression , start , length )

And probably the CHARINDEX function

CHARINDEX (substr, expression)

To figure out the start and length for each part you want to extract.

So let's say the format is "First Name Last Name" you could (untested.. but should be close) :

SELECT 
SUBSTRING(fullname, 1, CHARINDEX(' ', fullname) - 1) AS FirstName, 
SUBSTRING(fullname, CHARINDEX(' ', fullname) + 1, len(fullname)) AS LastName
FROM YourTable

Solution 3

Reverse the problem, add columns to hold the individual pieces and combine them to get the full name.

The reason this will be the best answer is that there is no guaranteed way to figure out a person has registered as their first name, and what is their middle name.

For instance, how would you split this?

Jan Olav Olsen Heggelien

This, while being fictious, is a legal name in Norway, and could, but would not have to, be split like this:

First name: Jan Olav
Middle name: Olsen
Last name: Heggelien

or, like this:

First name: Jan Olav
Last name: Olsen Heggelien

or, like this:

First name: Jan
Middle name: Olav
Last name: Olsen Heggelien

I would imagine similar occurances can be found in most languages.

So instead of trying to interpreting data which does not have enough information to get it right, store the correct interpretation, and combine to get the full name.

Solution 4

Alternative simple way is to use parsename :

select full_name,
   parsename(replace(full_name, ' ', '.'), 3) as FirstName,
   parsename(replace(full_name, ' ', '.'), 2) as MiddleName,
   parsename(replace(full_name, ' ', '.'), 1) as LastName 
from YourTableName

source

Solution 5

Unless you have very, very well-behaved data, this is a non-trivial challenge. A naive approach would be to tokenize on whitespace and assume that a three-token result is [first, middle, last] and a two-token result is [first, last], but you're going to have to deal with multi-word surnames (e.g. "Van Buren") and multiple middle names.

Share:
147,481

Related videos on Youtube

Even Mien
Author by

Even Mien

How many licks does it take to understand a regular expression?

Updated on July 24, 2021

Comments

  • Even Mien
    Even Mien almost 3 years

    How do I parse the first, middle, and last name out of a fullname field with SQL?

    I need to try to match up on names that are not a direct match on full name. I'd like to be able to take the full name field and break it up into first, middle and last name.

    The data does not include any prefixes or suffixes. The middle name is optional. The data is formatted 'First Middle Last'.

    I'm interested in some practical solutions to get me 90% of the way there. As it has been stated, this is a complex problem, so I'll handle special cases individually.

  • Bill Karwin
    Bill Karwin over 15 years
    There are also people who have only a first name. Not only celebrities like Madonna and Cher and Bono, but it's traditional in Iceland for example to go by your first name only.
  • Even Mien
    Even Mien over 15 years
    This seems like the practical approach that I need to use. The middle name could be anything that is not included in the First or Last name.
  • Even Mien
    Even Mien over 15 years
    Unfortunately, this is data conversion. It is what it is.
  • Lasse V. Karlsen
    Lasse V. Karlsen over 15 years
    Then you're going to have to build a simple algorithm, and just handle the errors afterwards when you become aware of them.
  • dburges
    dburges over 15 years
    Hey we use Melissa data for zip codes. I didn't know you had something for names, will need to check it out.
  • Franci Penov
    Franci Penov over 15 years
    @Bill Karwin - yep, I mentioned that if you have a person with only one name, it's not clear if it's their first or last name.
  • Even Mien
    Even Mien over 15 years
    I ended up being naive, and it all worked out in the end. Thanks.
  • EfficionDave
    EfficionDave almost 14 years
    Great answer but it doesn't do a good job if the full name includes prefixes (Dr., Mr., Ms.)
  • JosephStyons
    JosephStyons almost 14 years
    @EfficionDave: you are quite right, it won't handle those situations well at all. Those kinds of things are why the answer by Josh Millard is also true. Parsing unruly data is a nontrivial challenge, which is why Google is able to make so much money at it.
  • JosephStyons
    JosephStyons almost 14 years
    @EfficionDave: ok, so I couldn't get it off my mind until I fixed that issue. Check out the revised version; you have to manually provide a list of strings you want to consider "titles" though.
  • EfficionDave
    EfficionDave over 13 years
    I've created a SQL Function based on JosephStyons script above that returns the First Name given the full name. efficionconsulting.com/Blog/itemid/643/amid/1500/…
  • rjmunro
    rjmunro over 12 years
    This isn't an answer. Everyone knows the problem is hard, but it's not impossible to get 90% of the way there if you can tolerate some errors.
  • CodesInChaos
    CodesInChaos over 12 years
    I don't think every name has a meaningful decomposition into first-middle-last. And combining is hard too, since for many Asian names the family names comes before the first name.
  • Digs
    Digs about 11 years
    I've modified the code slightly so that it accepts more titles (mr, mrs, miss, fr, dr, sir, prof, lady, lord) and also so that accepts a dot after the salutation, eg.: "Dr. John Matthews". pastebin.com/xfX9YFdN
  • Digs
    Digs about 11 years
    And here it is as a stand-alone function that can be used like this: SELECT * FROM NameSplitter('Mr. George Washington') pastebin.com/d91VBRe9
  • Kross
    Kross over 7 years
    Nicely done! This is very useful, I made it more DRY by declaring a variable with list of title prefixes.
  • GeoffDS
    GeoffDS over 6 years
    This still doesn't work for 'John Smith Jr'. It puts Jr in the last name and Smith in the middle name. In looking to do something similar, one of my first steps was to break off the suffix by looking for specific values. Then, I worked on splitting the rest into first, middle, last. Then, added suffix back onto last.
  • GeoffDS
    GeoffDS over 6 years
    This doesn't work when there are compound names, like a last name of van buren.
  • JosephStyons
    JosephStyons over 6 years
    @Graphth that sounds like a reasonable approach to me too. You are quite right that your example breaks the code I provided. I'm sure there are lots of potential issues; names can be messy, and handling them like this is going to be awkward. Hopefully it can serve as a starting point for some folks.
  • Nigel Givans
    Nigel Givans over 4 years
    I have encountered a case here this throws a Msg 537, Level 16, State 3, Line 6 Invalid length parameter passed to the LEFT or SUBSTRING function