Natural Sort in MySQL
Solution 1
I think this is why a lot of things are sorted by release date.
A solution could be to create another column in your table for the "SortKey". This could be a sanitized version of the title which conforms to a pattern you create for easy sorting or a counter.
Solution 2
Here is a quick solution:
SELECT alphanumeric,
integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric
Solution 3
Same function as posted by @plalx, but rewritten to MySQL:
DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000))
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
DECLARE position int;
DECLARE tmp_position int;
SET position = 5000;
SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
IF (position = 5000) THEN RETURN 0; END IF;
RETURN position;
END
;;
DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50))
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
DECLARE sortString varchar(4000);
DECLARE numStartIndex int;
DECLARE numEndIndex int;
DECLARE padLength int;
DECLARE totalPadLength int;
DECLARE i int;
DECLARE sameOrderCharsLen int;
SET totalPadLength = 0;
SET instring = TRIM(instring);
SET sortString = instring;
SET numStartIndex = udf_FirstNumberPos(instring);
SET numEndIndex = 0;
SET i = 1;
SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);
WHILE (i <= sameOrderCharsLen) DO
SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
SET i = i + 1;
END WHILE;
WHILE (numStartIndex <> 0) DO
SET numStartIndex = numStartIndex + numEndIndex;
SET numEndIndex = numStartIndex;
WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
SET numEndIndex = numEndIndex + 1;
END WHILE;
SET numEndIndex = numEndIndex - 1;
SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);
IF padLength < 0 THEN
SET padLength = 0;
END IF;
SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));
SET totalPadLength = totalPadLength + padLength;
SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
END WHILE;
RETURN sortString;
END
;;
Usage:
SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")
Solution 4
I've written this function for MSSQL 2000 a while ago:
/**
* Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
*
* @author Alexandre Potvin Latreille (plalx)
* @param {nvarchar(4000)} string The formatted string.
* @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
* @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
*
* @return {nvarchar(4000)} A string for natural sorting.
* Example of use:
*
* SELECT Name FROM TableA ORDER BY Name
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1-1.
* 2. A1-1. 2. A1.
* 3. R1 --> 3. R1
* 4. R11 4. R11
* 5. R2 5. R2
*
*
* As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
* We can use this function to fix this.
*
* SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1.
* 2. A1-1. 2. A1-1.
* 3. R1 --> 3. R1
* 4. R11 4. R2
* 5. R2 5. R11
*/
CREATE FUNCTION dbo.udf_NaturalSortFormat(
@string nvarchar(4000),
@numberLength int = 10,
@sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
DECLARE @sortString varchar(4000),
@numStartIndex int,
@numEndIndex int,
@padLength int,
@totalPadLength int,
@i int,
@sameOrderCharsLen int;
SELECT
@totalPadLength = 0,
@string = RTRIM(LTRIM(@string)),
@sortString = @string,
@numStartIndex = PATINDEX('%[0-9]%', @string),
@numEndIndex = 0,
@i = 1,
@sameOrderCharsLen = LEN(@sameOrderChars);
-- Replace all char that has to have the same order by a space.
WHILE (@i <= @sameOrderCharsLen)
BEGIN
SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
SET @i = @i + 1;
END
-- Pad numbers with zeros.
WHILE (@numStartIndex <> 0)
BEGIN
SET @numStartIndex = @numStartIndex + @numEndIndex;
SET @numEndIndex = @numStartIndex;
WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
BEGIN
SET @numEndIndex = @numEndIndex + 1;
END
SET @numEndIndex = @numEndIndex - 1;
SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);
IF @padLength < 0
BEGIN
SET @padLength = 0;
END
SET @sortString = STUFF(
@sortString,
@numStartIndex + @totalPadLength,
0,
REPLICATE('0', @padLength)
);
SET @totalPadLength = @totalPadLength + @padLength;
SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
END
RETURN @sortString;
END
GO
Solution 5
MySQL doesn't allow this sort of "natural sorting", so it looks like the best way to get what you're after is to split your data set up as you've described above (separate id field, etc), or failing that, perform a sort based on a non-title element, indexed element in your db (date, inserted id in the db, etc).
Having the db do the sorting for you is almost always going to be quicker than reading large data sets into your programming language of choice and sorting it there, so if you've any control at all over the db schema here, then look at adding easily-sorted fields as described above, it'll save you a lot of hassle and maintenance in the long run.
Requests to add a "natural sort" come up from time to time on the MySQL bugs and discussion forums, and many solutions revolve around stripping out specific parts of your data and casting them for the ORDER BY
part of the query, e.g.
SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned)
This sort of solution could just about be made to work on your Final Fantasy example above, but isn't particularly flexible and unlikely to extend cleanly to a dataset including, say, "Warhammer 40,000" and "James Bond 007" I'm afraid.
Nishan
Professional Web Developer since 2001, amateur developer since 198x. Eating and breathing JavaScript and PHP in my day-to-day live, but have seen a lot in my 30+ years of code-juggling. Adobe Certified Expert - Adobe Analytics Developer
Updated on July 08, 2022Comments
-
Nishan almost 2 years
Is there an elegant way to have performant, natural sorting in a MySQL database?
For example if I have this data set:
- Final Fantasy
- Final Fantasy 4
- Final Fantasy 10
- Final Fantasy 12
- Final Fantasy 12: Chains of Promathia
- Final Fantasy Adventure
- Final Fantasy Origins
- Final Fantasy Tactics
Any other elegant solution than to split up the games' names into their components
- Title: "Final Fantasy"
- Number: "12"
- Subtitle: "Chains of Promathia"
to make sure that they come out in the right order? (10 after 4, not before 2).
Doing so is a pain in the a** because every now and then there's another game that breaks that mechanism of parsing the game title (e.g. "Warhammer 40,000", "James Bond 007")
-
Nishan about 13 yearsTheoretically that's possible, but I would need to read all database records to my webserver first.
-
fortboise over 12 yearsThat's nice if everything is "Final Fantasy", but it puts "Goofy" ahead of the FF suite.
-
Borut Tomazin over 11 yearsThis is the only solution that really works. I've also tested drupals code but it fails sometimes. Thanks man!
-
Borut Tomazin over 11 yearsThis solution does not works all the time. It breaks sometimes. You should rather use this one: stackoverflow.com/a/12257917/384864
-
antoine over 10 yearsI don't know how performant this would be. I am using it all the time without any inconveniences. My database isn't big tho.
-
offby1 over 9 yearsPiling kludge upon kludge:
SELECT alphanumeric, integer FROM sorting_test ORDER BY SOUNDEX(alphanumeric), LENGTH(alphanumeric), alphanumeric
. If this works at all, it's because SOUNDEX conveniently discards the numbers, thus ensuring that e.g.apple1
comes beforez1
. -
Asped over 9 yearsgreat solution, thanks, though I had to do switch
alphanmuric
,length(alphanumeric)
to avoid "Goofy" before "Final Fantasy" -
Pyton over 7 yearsWorks very well to sorting numbers in format
23-4244
. Thanks :) -
m47730 over 7 yearswhy don't remove this answer? you'll get a badge for this
-
Mark Steudel about 7 yearsAnyone use this on really big tables 10+ million?
-
plalx about 7 years@MarkSteudel You would have to give it a go and test it for yourself. At worse you could always cache the formatted values. That's probably what I would do for large tables because you could index the field as well.
-
random_user_name almost 7 yearsUnfortunately this breaks down if you add values in such as
a1
,a2
,a11
, etc... -
JamesUsedHarden over 6 yearsonly works with this test data because the strings before the number are all the same. Try sticking in a value
z_99
in there and it will get put at the top butz
comes afterv
. -
Tarik over 6 years@SamuelNeff please see SQL: ORDER BY LENGTH(test_column) DESC, test_column DESC so yes, because it will sort by length of the column first. This works well sorting a prefix group of table which otherwise you would not be able to sort with only "test_column DESC"
-
Christian over 6 yearsI just wrote a class for exactly that stackoverflow.com/a/47522040/935122
-
Ben Hitchcock about 6 yearsThanks for this, I've been trying all sorts of solutions (ha ha see what I did there?) but none of them really worked for all the data I had. The drupal function worked like a charm. Thanks for posting.
-
Jacob over 5 years@MarkSteudel We use a function similar to this (though not this exact one) for natural sorting on several tables, the largest of which is ~5 million rows. However, we don't call it directly in our queries but instead use it to set the value of a
nat_name
column. We use a trigger to run the function every time a row is updated. This approach gives you natural sorting with no real performance cost at the expense of an additional column. -
prathima krishna almost 5 yearsthis works but sorts numbers at the end (A-Z then 0-9)
-
prathima krishna almost 5 yearsthis works, sorting numbers before letters, and can be implemented in Drupal using hook_views_query_alter, using something similiar to this
if ($query->orderby[0]["field"] === "node_field_data.title") { $orderBySql = " udf_NaturalSortFormat(node_field_data.title, 10, '.') "; $query->orderby = []; $query->addOrderBy(NULL, $orderBySql, $query->orderby[0]["direction"], 'title_natural'); array_unshift($query->orderby, end($query->orderby)); }
-
Raymond Nijland over 4 years@offby1 suggestion only works if the text is 100% written in english as
SOUNDEX()
is designed to only work correctly on english words. -
Doin over 4 yearsThis is definitely the right approach, but is hardly an answer by itself!
-
John T over 4 yearsI'm a beginner in MySQL and tried this. Got this error: "#1305 - FUNCTION mydatabase.REGEXP_INSTR does not exist". Any idea?
-
John T over 4 yearsFor any other newbie out there. I didn't have MySQL 8.0 installed. It's needed for REGEXP_INSTR(and other REGEXP stuff).
-
Avatar over 4 yearsAlternatively consider:
usort($mydata, function ($item1, $item2) { return strnatcmp($item1['key'], $item2['key']); });
(I have an associative array and sort by key.) Ref: stackoverflow.com/q/12426825/1066234 -
Avatar about 4 yearsThe
LPAD()
hint was very helpful. I have words and numbers to sort, withLPAD
I could sort the numbers naturally. And usingCONCAT
I ignore non-numbers. My query looks like this (alias is the column to sort):IF(CONCAT("",alias*1)=alias, LPAD(alias,5,"0"), alias) ASC;
👍 -
Doin almost 4 yearsJust fixed a serious bug in NatSortKey: there was an incorrect regex character. If you've used this function yourself, please update your code!
-
YSC over 3 yearsThat's sad we should resort to such tricks, nice trick anyway.
-
dresende about 3 yearsThis is a great and very simple answer to natural sorting numbers. Thank you.
-
Xriuk almost 2 yearsHere's a version for SQL Server, if anyone is interested: gist.github.com/Xriuk/93288e4527dbe0ce1936e748dfdfb7df