Primary Key Sorting

21,051

Solution 1

Data is physically stored by clustered index, which is usually the primary key but doesn't have to be.

Data in SQL is not guaranteed to have order without an ORDER BY clause. You should always specify an ORDER BY clause when you need the data to be in a particular order. If the table is already sorted that way, the optimizer won't do any extra work, so there's no harm in having it there.

Without an ORDER BY clause, the RDBMS might return cached pages matching your query while it waits for records to be read in from disk. In that case, even if there is an index on the table, data might not come in in the index's order. (Note this is just an example - I don't know or even think that a real-world RDBMS will do this, but it's acceptable behaviour for an SQL implementation.)

EDIT

If you have a performance impact when sorting versus when not sorting, you're probably sorting on a column (or set of columns) that doesn't have an index (clustered or otherwise). Given that it's a time series, you might be sorting based on time, but the clustered index is on the primary bigint. SQL Server doesn't know that both increase the same way, so it has to resort everything.

If the time column and the primary key column are a related by order (one increases if and only if the other increases or stays the same), sort by the primary key instead. If they aren't related this way, move the clustered index from the primary key to whatever column(s) you're sorting by.

Solution 2

Without an explicit ORDER BY, there is no default sort order. A very common question. As such, there is a canned answer:

Without ORDER BY, there is no default sort order.

Can you elaborate why "The performance difference is significant."?

Solution 3

You must apply the ORDER BY to guarantee an order. If you are noticing a performance difference than it is likely your data was not sorted without the ORDER BY in place — otherwise SQL-Server must be behaving badly since it is not realizing the data is already sorted. Adding the ORDER BY on already sorted data should not incur a performance penalty since the RDBMS should be smart enough to realize the order of the data.

Solution 4

A table by default is not 'clustered' , i.e. organized by PK. You do have the option of specifying it as such. So the default is "HEAP" (in no particular order), and the option you are looking for is "CLUSTERED" (SQL Server, in Oracle its called IOT).

  • A table can only have one CLUSTERED (makes sense)
  • Use the PRIMARY KEY CLUSTERED syntax on the DDL
  • Order by PK still needs to be issued on your SELECTS, the fact of it being clustered will cause the query to run faster, as the optimizer plan will know it does not need to do the sorting on a clustered index

The earlier poster is correct, SQL (and the theoretical basis of it) specifically defines a select as an unordered set/tuple.

SQL usually tries to stay in the logical-realm and not make assumptions about the physical organization / locations etc. of the data. The CLUSTERED option allows us to do that for practical real-life situations.

Share:
21,051
Admin
Author by

Admin

Updated on August 22, 2022

Comments

  • Admin
    Admin over 1 year

    Is a table intrinsically sorted by it's primary key? If I have a table with the primary key on a BigInt identity column can I trust that queries will always return the data sorted by the key or do I explicitly need to add the "ORDER BY". The performance difference is significant.

  • marc_s
    marc_s almost 15 years
    only if the primary key is also the CLUSTERING KEY - which it is by default, but doesn't HAVE to be.......
  • Philip Kelley
    Philip Kelley almost 15 years
    The first paragraph should say "Data is physically stored by clustered index...". Everything else Welbog says applies--just because it's physically stored [within each page] in an order doesn't mean you'll get it back in that order. Physical disk fragmentation may also have an impact on this.
  • Welbog
    Welbog almost 15 years
    @Philip Kelley: Changed to reflect your better phrasing. Thanks.
  • Admin
    Admin almost 15 years
    The data is time series and the queries are pulling back months worth of data. Without the Order By the stored procedure is able to begin returning rows within seconds. With the Order By it is up to a minute before the first row returns.
  • Admin
    Admin almost 15 years
    I am actually sorting on the Primary Key (which is the BigInt). The data has been inserted in an ordered fashion (by date).
  • Welbog
    Welbog almost 15 years
    Is the primary key a clustered index?
  • Admin
    Admin almost 15 years
    The Primary Key is clustered and the key is the ID (BigInt) field.