Group by every N records in T-SQL

19,725

Solution 1

WITH T AS (
  SELECT RANK() OVER (ORDER BY ID) Rank,
    P.Field1, P.Field2, P.Value1, ...
  FROM P
)
SELECT (Rank - 1) / 1000 GroupID, AVG(...)
FROM T
GROUP BY ((Rank - 1) / 1000)
;

Something like that should get you started. If you can provide your actual schema I can update as appropriate.

Solution 2

Give the answer to Yuck. I only post as an answer so I could include a code block. I did a count test to see if it was grouping by 1000 and the first set was 999. This produced set sizes of 1,000. Great query Yuck.

    WITH T AS (
    SELECT RANK() OVER (ORDER BY sID) Rank, sID 
    FROM docSVsys
    )
    SELECT (Rank-1) / 1000 GroupID, count(sID)
    FROM T
    GROUP BY ((Rank-1) / 1000)
    order by GroupID 

Solution 3

I +1'd @Yuck, because I think that is a good answer. But it's worth mentioning NTILE().

Reason being, if you have 10,010 records (for example), then you'll have 11 groupings -- the first 10 with 1000 in them, and the last with just 10.

If you're comparing averages between each group of 1000, then you should either discard the last group as it's not a representative group, or...you could make all the groups the same size.

NTILE() would make all groups the same size; the only caveat is that you'd need to know how many groups you wanted.

So if your table had 25,250 records, you'd use NTILE(25), and your groupings would be approximately 1000 in size -- they'd actually be 1010 in size; the benefit being, they'd all be the same size, which might make them more relevant to each other in terms of whatever comparison analysis you're doing.

You could get your group-size simply by

DECLARE @ntile int
SET  @ntile = (SELECT count(1) from myTable) / 1000

And then modifying @Yuck's approach with the NTILE() substitution:

;WITH myCTE AS (
  SELECT NTILE(@ntile) OVER (ORDER BY id) myGroup,
    col1, col2, ...
  FROM dbo.myTable
)
SELECT myGroup, col1, col2...
FROM myCTE
GROUP BY (myGroup), col1, col2...
;

Solution 4

Answer above does not actually assign a unique group id to each 1000 records. Adding Floor() is needed. The following will return all records from your table, with a unique GroupID for each 1000 rows:

WITH T AS (
  SELECT RANK() OVER (ORDER BY your_field) Rank,
    your_field
  FROM your_table
  WHERE your_field = 'your_criteria'
)
SELECT Floor((Rank-1) / 1000) GroupID, your_field
FROM T

And for my needs, I wanted my GroupID to be a random set of characters, so I changed the Floor(...) GroupID to:

TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 10) AS STRING),'seed1'))) GroupID

without the seed value, you and I would get the exact same output because we're just doing a SHA256 on the number 1, 2, 3 etc. But adding the seed makes the output unique, but still repeatable.

This is BigQuery syntax. T-SQL might be slightly different.

Lastly, if you want to leave off the last chunk that is not a full 1000, you can find it by doing:

WITH T AS (
  SELECT RANK() OVER (ORDER BY your_field) Rank,
    your_field
  FROM your_table
  WHERE your_field = 'your_criteria'
)
SELECT Floor((Rank-1) / 1000) GroupID, your_field
, COUNT(*) OVER(PARTITION BY TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 1000) AS STRING),'seed1')))) AS CountInGroup
FROM T
ORDER BY CountInGroup

Solution 5

You can also use Row_Number() instead of rank. No Floor required.

declare @groupsize int = 50

;with ct1 as (  select YourColumn, RowID = Row_Number() over(order by YourColumn)
                from YourTable
             )

select YourColumn, RowID, GroupID = (RowID-1)/@GroupSize + 1
from ct1
Share:
19,725

Related videos on Youtube

ahmet alp balkan
Author by

ahmet alp balkan

I am a software engineer on Twitter compute infrastructure team. Previously I've worked at Google Cloud on Kubernetes, Cloud Run and Knative, and at Microsoft Azure on various parts of the Docker open source ecosystem. Find me on my: (blog | twitter | github)

Updated on June 10, 2022

Comments

  • ahmet alp balkan
    ahmet alp balkan about 2 years

    I have some performance test results on the database, and what I want to do is to group every 1000 records (previously sorted in ascending order by date) and then aggregate results with AVG.

    I'm actually looking for a standard SQL solution, however any T-SQL specific results are also appreciated.

    The query looks like this:

    SELECT TestId,Throughput  FROM dbo.Results ORDER BY id
    
  • paparazzo
    paparazzo almost 13 years
    If you use Yuck straight up you could include a count(*) so at least you are aware of the last group size.
  • sheldonhull
    sheldonhull over 7 years
    I've messed around with a lot of sql functionality , but never had come across this function. Fantastic example, had a good use case for it. thanks for this answer and detailed response.
  • stomy
    stomy over 6 years
    See the NTILE docs. The function "Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs".