Natural Sort in MySQL

86,635

Solution 1

I think this is why a lot of things are sorted by release date.

A solution could be to create another column in your table for the "SortKey". This could be a sanitized version of the title which conforms to a pattern you create for easy sorting or a counter.

Solution 2

Here is a quick solution:

SELECT alphanumeric, 
       integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric

Solution 3

Same function as posted by @plalx, but rewritten to MySQL:

DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000)) 
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE position int;
    DECLARE tmp_position int;
    SET position = 5000;
    SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; 
    SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;

    IF (position = 5000) THEN RETURN 0; END IF;
    RETURN position;
END
;;

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

Usage:

SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")

Solution 4

I've written this function for MSSQL 2000 a while ago:

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1-1.       
 *  2.  A1-1.                   2.  A1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R11
 *  5.  R2                  5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1.     
 *  2.  A1-1.                   2.  A1-1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R2
 *  5.  R2                  5.  R11
 */
CREATE FUNCTION dbo.udf_NaturalSortFormat(
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that has to have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

GO

Solution 5

MySQL doesn't allow this sort of "natural sorting", so it looks like the best way to get what you're after is to split your data set up as you've described above (separate id field, etc), or failing that, perform a sort based on a non-title element, indexed element in your db (date, inserted id in the db, etc).

Having the db do the sorting for you is almost always going to be quicker than reading large data sets into your programming language of choice and sorting it there, so if you've any control at all over the db schema here, then look at adding easily-sorted fields as described above, it'll save you a lot of hassle and maintenance in the long run.

Requests to add a "natural sort" come up from time to time on the MySQL bugs and discussion forums, and many solutions revolve around stripping out specific parts of your data and casting them for the ORDER BY part of the query, e.g.

SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned) 

This sort of solution could just about be made to work on your Final Fantasy example above, but isn't particularly flexible and unlikely to extend cleanly to a dataset including, say, "Warhammer 40,000" and "James Bond 007" I'm afraid.

Share:
86,635
Nishan
Author by

Nishan

Professional Web Developer since 2001, amateur developer since 198x. Eating and breathing JavaScript and PHP in my day-to-day live, but have seen a lot in my 30+ years of code-juggling. Adobe Certified Expert - Adobe Analytics Developer

Updated on July 08, 2022

Comments

  • Nishan
    Nishan almost 2 years

    Is there an elegant way to have performant, natural sorting in a MySQL database?

    For example if I have this data set:

    • Final Fantasy
    • Final Fantasy 4
    • Final Fantasy 10
    • Final Fantasy 12
    • Final Fantasy 12: Chains of Promathia
    • Final Fantasy Adventure
    • Final Fantasy Origins
    • Final Fantasy Tactics

    Any other elegant solution than to split up the games' names into their components

    • Title: "Final Fantasy"
    • Number: "12"
    • Subtitle: "Chains of Promathia"

    to make sure that they come out in the right order? (10 after 4, not before 2).

    Doing so is a pain in the a** because every now and then there's another game that breaks that mechanism of parsing the game title (e.g. "Warhammer 40,000", "James Bond 007")

  • Nishan
    Nishan about 13 years
    Theoretically that's possible, but I would need to read all database records to my webserver first.
  • fortboise
    fortboise over 12 years
    That's nice if everything is "Final Fantasy", but it puts "Goofy" ahead of the FF suite.
  • Borut Tomazin
    Borut Tomazin over 11 years
    This is the only solution that really works. I've also tested drupals code but it fails sometimes. Thanks man!
  • Borut Tomazin
    Borut Tomazin over 11 years
    This solution does not works all the time. It breaks sometimes. You should rather use this one: stackoverflow.com/a/12257917/384864
  • antoine
    antoine over 10 years
    I don't know how performant this would be. I am using it all the time without any inconveniences. My database isn't big tho.
  • offby1
    offby1 over 9 years
    Piling kludge upon kludge: SELECT alphanumeric, integer FROM sorting_test ORDER BY SOUNDEX(alphanumeric), LENGTH(alphanumeric), alphanumeric. If this works at all, it's because SOUNDEX conveniently discards the numbers, thus ensuring that e.g. apple1 comes before z1.
  • Asped
    Asped over 9 years
    great solution, thanks, though I had to do switch alphanmuric,length(alphanumeric) to avoid "Goofy" before "Final Fantasy"
  • Pyton
    Pyton over 7 years
    Works very well to sorting numbers in format 23-4244. Thanks :)
  • m47730
    m47730 over 7 years
    why don't remove this answer? you'll get a badge for this
  • Mark Steudel
    Mark Steudel about 7 years
    Anyone use this on really big tables 10+ million?
  • plalx
    plalx about 7 years
    @MarkSteudel You would have to give it a go and test it for yourself. At worse you could always cache the formatted values. That's probably what I would do for large tables because you could index the field as well.
  • random_user_name
    random_user_name almost 7 years
    Unfortunately this breaks down if you add values in such as a1, a2, a11, etc...
  • JamesUsedHarden
    JamesUsedHarden over 6 years
    only works with this test data because the strings before the number are all the same. Try sticking in a value z_99 in there and it will get put at the top but z comes after v.
  • Tarik
    Tarik over 6 years
    @SamuelNeff please see SQL: ORDER BY LENGTH(test_column) DESC, test_column DESC so yes, because it will sort by length of the column first. This works well sorting a prefix group of table which otherwise you would not be able to sort with only "test_column DESC"
  • Christian
    Christian over 6 years
    I just wrote a class for exactly that stackoverflow.com/a/47522040/935122
  • Ben Hitchcock
    Ben Hitchcock about 6 years
    Thanks for this, I've been trying all sorts of solutions (ha ha see what I did there?) but none of them really worked for all the data I had. The drupal function worked like a charm. Thanks for posting.
  • Jacob
    Jacob over 5 years
    @MarkSteudel We use a function similar to this (though not this exact one) for natural sorting on several tables, the largest of which is ~5 million rows. However, we don't call it directly in our queries but instead use it to set the value of a nat_name column. We use a trigger to run the function every time a row is updated. This approach gives you natural sorting with no real performance cost at the expense of an additional column.
  • prathima krishna
    prathima krishna almost 5 years
    this works but sorts numbers at the end (A-Z then 0-9)
  • prathima krishna
    prathima krishna almost 5 years
    this works, sorting numbers before letters, and can be implemented in Drupal using hook_views_query_alter, using something similiar to this if ($query->orderby[0]["field"] === "node_field_data.title") { $orderBySql = " udf_NaturalSortFormat(node_field_data.title, 10, '.') "; $query->orderby = []; $query->addOrderBy(NULL, $orderBySql, $query->orderby[0]["direction"], 'title_natural'); array_unshift($query->orderby, end($query->orderby)); }
  • Raymond Nijland
    Raymond Nijland over 4 years
    @offby1 suggestion only works if the text is 100% written in english as SOUNDEX() is designed to only work correctly on english words.
  • Doin
    Doin over 4 years
    This is definitely the right approach, but is hardly an answer by itself!
  • John T
    John T over 4 years
    I'm a beginner in MySQL and tried this. Got this error: "#1305 - FUNCTION mydatabase.REGEXP_INSTR does not exist". Any idea?
  • John T
    John T over 4 years
    For any other newbie out there. I didn't have MySQL 8.0 installed. It's needed for REGEXP_INSTR(and other REGEXP stuff).
  • Avatar
    Avatar over 4 years
    Alternatively consider: usort($mydata, function ($item1, $item2) { return strnatcmp($item1['key'], $item2['key']); }); (I have an associative array and sort by key.) Ref: stackoverflow.com/q/12426825/1066234
  • Avatar
    Avatar about 4 years
    The LPAD() hint was very helpful. I have words and numbers to sort, with LPAD I could sort the numbers naturally. And using CONCAT I ignore non-numbers. My query looks like this (alias is the column to sort): IF(CONCAT("",alias*1)=alias, LPAD(alias,5,"0"), alias) ASC; 👍
  • Doin
    Doin almost 4 years
    Just fixed a serious bug in NatSortKey: there was an incorrect regex character. If you've used this function yourself, please update your code!
  • YSC
    YSC over 3 years
    That's sad we should resort to such tricks, nice trick anyway.
  • dresende
    dresende about 3 years
    This is a great and very simple answer to natural sorting numbers. Thank you.
  • Xriuk
    Xriuk almost 2 years
    Here's a version for SQL Server, if anyone is interested: gist.github.com/Xriuk/93288e4527dbe0ce1936e748dfdfb7df