SQL: parse the first, middle and last name from a fullname field
Solution 1
Here is a self-contained example, with easily manipulated test data.
With this example, if you have a name with more than three parts, then all the "extra" stuff will get put in the LAST_NAME field. An exception is made for specific strings that are identified as "titles", such as "DR", "MRS", and "MR".
If the middle name is missing, then you just get FIRST_NAME and LAST_NAME (MIDDLE_NAME will be NULL).
You could smash it into a giant nested blob of SUBSTRINGs, but readability is hard enough as it is when you do this in SQL.
Edit-- Handle the following special cases:
1 - The NAME field is NULL
2 - The NAME field contains leading / trailing spaces
3 - The NAME field has > 1 consecutive space within the name
4 - The NAME field contains ONLY the first name
5 - Include the original full name in the final output as a separate column, for readability
6 - Handle a specific list of prefixes as a separate "title" column
SELECT
FIRST_NAME.ORIGINAL_INPUT_DATA
,FIRST_NAME.TITLE
,FIRST_NAME.FIRST_NAME
,CASE WHEN 0 = CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
THEN NULL --no more spaces? assume rest is the last name
ELSE SUBSTRING(
FIRST_NAME.REST_OF_NAME
,1
,CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)-1
)
END AS MIDDLE_NAME
,SUBSTRING(
FIRST_NAME.REST_OF_NAME
,1 + CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
,LEN(FIRST_NAME.REST_OF_NAME)
) AS LAST_NAME
FROM
(
SELECT
TITLE.TITLE
,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)
THEN TITLE.REST_OF_NAME --No space? return the whole thing
ELSE SUBSTRING(
TITLE.REST_OF_NAME
,1
,CHARINDEX(' ',TITLE.REST_OF_NAME)-1
)
END AS FIRST_NAME
,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)
THEN NULL --no spaces @ all? then 1st name is all we have
ELSE SUBSTRING(
TITLE.REST_OF_NAME
,CHARINDEX(' ',TITLE.REST_OF_NAME)+1
,LEN(TITLE.REST_OF_NAME)
)
END AS REST_OF_NAME
,TITLE.ORIGINAL_INPUT_DATA
FROM
(
SELECT
--if the first three characters are in this list,
--then pull it as a "title". otherwise return NULL for title.
CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,1,3)))
ELSE NULL
END AS TITLE
--if you change the list, don't forget to change it here, too.
--so much for the DRY prinicple...
,CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,4,LEN(TEST_DATA.FULL_NAME))))
ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME))
END AS REST_OF_NAME
,TEST_DATA.ORIGINAL_INPUT_DATA
FROM
(
SELECT
--trim leading & trailing spaces before trying to process
--disallow extra spaces *within* the name
REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)),' ',' '),' ',' ') AS FULL_NAME
,FULL_NAME AS ORIGINAL_INPUT_DATA
FROM
(
--if you use this, then replace the following
--block with your actual table
SELECT 'GEORGE W BUSH' AS FULL_NAME
UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME
UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME
UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME
UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME
UNION SELECT 'TOMMY' AS FULL_NAME
UNION SELECT 'BILLY' AS FULL_NAME
UNION SELECT NULL AS FULL_NAME
UNION SELECT ' ' AS FULL_NAME
UNION SELECT ' JOHN JACOB SMITH' AS FULL_NAME
UNION SELECT ' DR SANJAY GUPTA' AS FULL_NAME
UNION SELECT 'DR JOHN S HOPKINS' AS FULL_NAME
UNION SELECT ' MRS SUSAN ADAMS' AS FULL_NAME
UNION SELECT ' MS AUGUSTA ADA KING ' AS FULL_NAME
) RAW_DATA
) TEST_DATA
) TITLE
) FIRST_NAME
Solution 2
It's difficult to answer without knowing how the "full name" is formatted.
It could be "Last Name, First Name Middle Name" or "First Name Middle Name Last Name", etc.
Basically you'll have to use the SUBSTRING function
SUBSTRING ( expression , start , length )
And probably the CHARINDEX function
CHARINDEX (substr, expression)
To figure out the start and length for each part you want to extract.
So let's say the format is "First Name Last Name" you could (untested.. but should be close) :
SELECT
SUBSTRING(fullname, 1, CHARINDEX(' ', fullname) - 1) AS FirstName,
SUBSTRING(fullname, CHARINDEX(' ', fullname) + 1, len(fullname)) AS LastName
FROM YourTable
Solution 3
Reverse the problem, add columns to hold the individual pieces and combine them to get the full name.
The reason this will be the best answer is that there is no guaranteed way to figure out a person has registered as their first name, and what is their middle name.
For instance, how would you split this?
Jan Olav Olsen Heggelien
This, while being fictious, is a legal name in Norway, and could, but would not have to, be split like this:
First name: Jan Olav
Middle name: Olsen
Last name: Heggelien
or, like this:
First name: Jan Olav
Last name: Olsen Heggelien
or, like this:
First name: Jan
Middle name: Olav
Last name: Olsen Heggelien
I would imagine similar occurances can be found in most languages.
So instead of trying to interpreting data which does not have enough information to get it right, store the correct interpretation, and combine to get the full name.
Solution 4
Alternative simple way is to use parsename
:
select full_name,
parsename(replace(full_name, ' ', '.'), 3) as FirstName,
parsename(replace(full_name, ' ', '.'), 2) as MiddleName,
parsename(replace(full_name, ' ', '.'), 1) as LastName
from YourTableName
Solution 5
Unless you have very, very well-behaved data, this is a non-trivial challenge. A naive approach would be to tokenize on whitespace and assume that a three-token result is [first, middle, last] and a two-token result is [first, last], but you're going to have to deal with multi-word surnames (e.g. "Van Buren") and multiple middle names.
Related videos on Youtube
Even Mien
How many licks does it take to understand a regular expression?
Updated on July 24, 2021Comments
-
Even Mien almost 3 years
How do I parse the first, middle, and last name out of a fullname field with SQL?
I need to try to match up on names that are not a direct match on full name. I'd like to be able to take the full name field and break it up into first, middle and last name.
The data does not include any prefixes or suffixes. The middle name is optional. The data is formatted 'First Middle Last'.
I'm interested in some practical solutions to get me 90% of the way there. As it has been stated, this is a complex problem, so I'll handle special cases individually.
-
Bill Karwin over 15 yearsThere are also people who have only a first name. Not only celebrities like Madonna and Cher and Bono, but it's traditional in Iceland for example to go by your first name only.
-
Even Mien over 15 yearsThis seems like the practical approach that I need to use. The middle name could be anything that is not included in the First or Last name.
-
Even Mien over 15 yearsUnfortunately, this is data conversion. It is what it is.
-
Lasse V. Karlsen over 15 yearsThen you're going to have to build a simple algorithm, and just handle the errors afterwards when you become aware of them.
-
dburges over 15 yearsHey we use Melissa data for zip codes. I didn't know you had something for names, will need to check it out.
-
Franci Penov over 15 years@Bill Karwin - yep, I mentioned that if you have a person with only one name, it's not clear if it's their first or last name.
-
Even Mien over 15 yearsI ended up being naive, and it all worked out in the end. Thanks.
-
EfficionDave almost 14 yearsGreat answer but it doesn't do a good job if the full name includes prefixes (Dr., Mr., Ms.)
-
JosephStyons almost 14 years@EfficionDave: you are quite right, it won't handle those situations well at all. Those kinds of things are why the answer by Josh Millard is also true. Parsing unruly data is a nontrivial challenge, which is why Google is able to make so much money at it.
-
JosephStyons almost 14 years@EfficionDave: ok, so I couldn't get it off my mind until I fixed that issue. Check out the revised version; you have to manually provide a list of strings you want to consider "titles" though.
-
EfficionDave over 13 yearsI've created a SQL Function based on JosephStyons script above that returns the First Name given the full name. efficionconsulting.com/Blog/itemid/643/amid/1500/…
-
rjmunro over 12 yearsThis isn't an answer. Everyone knows the problem is hard, but it's not impossible to get 90% of the way there if you can tolerate some errors.
-
CodesInChaos over 12 yearsI don't think every name has a meaningful decomposition into first-middle-last. And combining is hard too, since for many Asian names the family names comes before the first name.
-
Digs about 11 yearsI've modified the code slightly so that it accepts more titles (mr, mrs, miss, fr, dr, sir, prof, lady, lord) and also so that accepts a dot after the salutation, eg.: "Dr. John Matthews". pastebin.com/xfX9YFdN
-
Digs about 11 yearsAnd here it is as a stand-alone function that can be used like this: SELECT * FROM NameSplitter('Mr. George Washington') pastebin.com/d91VBRe9
-
Kross over 7 yearsNicely done! This is very useful, I made it more DRY by declaring a variable with list of title prefixes.
-
GeoffDS over 6 yearsThis still doesn't work for 'John Smith Jr'. It puts Jr in the last name and Smith in the middle name. In looking to do something similar, one of my first steps was to break off the suffix by looking for specific values. Then, I worked on splitting the rest into first, middle, last. Then, added suffix back onto last.
-
GeoffDS over 6 yearsThis doesn't work when there are compound names, like a last name of van buren.
-
JosephStyons over 6 years@Graphth that sounds like a reasonable approach to me too. You are quite right that your example breaks the code I provided. I'm sure there are lots of potential issues; names can be messy, and handling them like this is going to be awkward. Hopefully it can serve as a starting point for some folks.
-
Nigel Givans over 4 yearsI have encountered a case here this throws a Msg 537, Level 16, State 3, Line 6 Invalid length parameter passed to the LEFT or SUBSTRING function