SQL Server : aggregate functions and GROUP BY in long queries

14,347

Solution 1

Subqueries and common table expressions (CTEs) are two ways to isolate the aggregation(s) down to a localized spot that is easy to join up to the main SELECT statement.

Correlated/synchronized subquery example:

SELECT c.contract_id, c.contractnum, c.customernum,
    ( SELECT SUM( li.price * li.orderquantity ) 
     FROM lineitems li 
     WHERE li.contract_id = c.contract_id
     ) AS grand_total
FROM contracts c 
WHERE ...

Referencing the correlated subquery in your SELECT list is a convenient way to bring back a single value.

Joining to a CTE:

WITH ordersummary AS 
( SELECT li.contract_id, 
      COUNT( DISTINCT li.partnum ) AS distinctproducts, 
      SUM( li.orderquantity ) AS itemcount,
      SUM( li.price * li.orderquantity) AS totalprice,
      MIN( p.ship_ready_date ) AS earliest_partial_ship_date,
      MAX( p.ship_ready_date ) AS earliest_complete_ship_date
      FROM lineitems li 
      INNER JOIN parts p 
          ON p.partnum = li.partnum
      GROUP BY li.contract_id
)
SELECT c.contract_id, c.contractnum, c.customernum,
s.distinctproducts, s.itemcount, s.totalprice,
s.earliest_partial_ship_date, s.earliest_complete_ship_date
FROM contracts c 
INNER JOIN ordersummary s
ON s.contract_id = c.contract_id
WHERE ...

Joining to an expression:

SELECT c.contract_id, c.contractnum, c.customernum,
s.distinctproducts, s.itemcount, s.totalprice,
s.earliest_partial_ship_date, s.earliest_complete_ship_date
FROM contracts c 
INNER JOIN
( SELECT li.contract_id, 
      COUNT( DISTINCT li.partnum ) AS distinctproducts, 
      SUM( li.orderquantity ) AS itemcount,
      SUM( li.price * li.orderquantity) AS totalprice,
      MIN( p.ship_ready_date ) AS earliest_partial_ship_date,
      MAX( p.ship_ready_date ) AS earliest_complete_ship_date
      FROM lineitems li 
      INNER JOIN parts p 
          ON p.partnum = li.partnum
      GROUP BY li.contract_id
) AS s
ON s.contract_id = c.contract_id
WHERE ...

Solution 2

I have two suggestions

1) Add a View to the database. This View is made up of a "computed" column and other columns that you require. The computed column is a result of concatenating the columns together from a query with all your joins. For instance "SELECT CONCAT(col1,col2), col3 FROM x JOIN y ... z ..."

You can make this a materialized view, which will give better performance since its saved on the disk. And thereby you avoid joining on the fly.

2) Try and avoid grouping by all the fields in the first place. For instance find out what is unique for each row, which you need the sum for. Maybe you do not need to return all these fields. And in the case that you do, you can start by grouping them and later join with the tables to return the columns you require.

EDIT: I agree with Fred Sobotka that subqueries is a good way to go. But it depends on what your data looks like. In general you should try and limit the number of joins, especially if you have a large dataset. Joins tend to be slow.

Share:
14,347
Tool
Author by

Tool

Updated on June 28, 2022

Comments

  • Tool
    Tool almost 2 years

    I was writing a query the other day in SQL Server, and had to get a sum of a column added to the query.

    Now, the query I had to add the sum column already had about 20 selected columns.

    In order for the sum (aggregate function) to work, I had to add GROUP BY for each pre selected column.

    This ended up as an extremely tedious and boring task (see the query below). I had to add about 50 new lines to the query just to get SUM for one column to work.

    Is there a way to automate writing such queries using SQL Server Management Studio?

    Is this the same with other databases (such as PL/SQL, Oracle, MySQL?)

    This is what the query looks like.

    SELECT 
           mvLogisticContracts.[mvLogisticContract_id]
          ,mvLogisticContracts.[CONTRACTNUMBER] 
          ,mvLogisticContracts.[CUSTNMBR]
          ,mvLogisticContracts.[DateCreated]
          ,mvLogisticContracts.[TruckAllocatedTotal]      ,mvLogisticContracts.[mvLOgisticAddress_id_StartOriginal]
              ,mvLogisticContracts.[mvLogisticAddress_id_StartOriginal]       ,mvLogisticContractLineItems.[LineItemQuantity]     ,mvLogisticContractLineItems.[LineItemNo]
              ,mvLogisticContractLineItems.[OffloadRequired]
              ,mvLogisticContractLineItems.[CarrierRequested]     ,mvLogisticContractLineItems.[RequestedSerialNo]
              ,mvLogisticContractLineItems.[mvLogisticAddress_id_Start]
              ,mvLogisticContractLineItems.[DateReadyBy]      ,mvLogisticContractLineItems.[DateCustomerRequested],
              mvLogisticContractLineItems.[mvLogisticContractLineItem_id]
          ,mvLogisticSalespersons.[FirstName],mvLogisticSalespersons.[LastName],
          mvLogisticServiceTypes.ServiceType,
          mvLogisticStatuses.Description,
          mvLogisticSKUs.[SKU],       mvLogisticSKUs.[Length],
          a1.CITY as \"CityStart\", a1.AddressName as \"AddressNameStart\", a1.Address1 as \"Address1Start\", a1.STATE as \"StateStart\", a1.ZIP as \"ZipStart\",
          a2.CITY as \"CityEnd\", a2.AddressName as \"AddressNameEnd\", a2.Address1 as \"Address1End\", a2.STATE as \"StateEnd\", a2.ZIP as \"ZipEnd\",
          mvLogisticContracts.[mvLogisticAddress_id_Start] AS aa1,
          mvLogisticContracts.[mvLogisticAddress_id_End] AS aa2,
          SUM(mvLogisticReleases.ReleaseQuantity) as ReleaseQtySum
                FROM mvLogisticContractLineItems
                LEFT JOIN mvLogisticContracts ON mvLogisticContractLineItems.mvLogisticContract_id = mvLogisticContracts.mvLogisticContract_id
                LEFT JOIN mvLogisticSalespersons ON mvLogisticContracts.[mvLogisticSalesperson_id] = mvLogisticSalespersons.[mvLogisticSalesperson_id]
                LEFT JOIN mvLogisticServiceTypes ON mvLogisticContractLineItems.mvLogisticServiceType_id = mvLogisticServiceTypes.mvLogisticServiceType_id
                LEFT JOIN mvLogisticStatuses ON mvLogisticContractLineItems.mvLogisticStatus_id = mvLogisticStatuses.mvLogisticStatus_id
                LEFT JOIN mvLogisticSKUs ON mvLogisticContractLineItems.[SKU_id] = mvLogisticSKUs.[SKU_id]
                LEFT JOIN mvLogisticAddresses a1 ON a1.[mvLogisticAddress_id] = mvLogisticContractLineItems.[mvLogisticAddress_id_Start]
                LEFT JOIN mvLogisticAddresses a2 ON a2.[mvLogisticAddress_id] = mvLogisticContractLineItems.[mvLogisticAddress_id_End]
                /*LEFT JOIN mvLogisticAddresses a1 ON a1.[mvLogisticAddress_id] = mvLogisticContracts.[mvLogisticAddress_id_Start]
                LEFT JOIN mvLogisticAddresses a2 ON a2.[mvLogisticAddress_id] = mvLogisticContracts.[mvLogisticAddress_id_End]*/
                LEFT JOIN mvLogisticReleases ON mvLogisticContractLineItems.mvLogisticContractLineItem_id = mvLogisticReleases.mvLogisticContractLineItem_id
                GROUP BY
                mvLogisticContracts.[mvLogisticContract_id]
          ,mvLogisticContracts.[CONTRACTNUMBER] 
          ,mvLogisticContracts.[CUSTNMBR]
          ,mvLogisticContracts.[DateCreated]
          ,mvLogisticContracts.[TruckAllocatedTotal]      ,mvLogisticContracts.[mvLOgisticAddress_id_StartOriginal]
              ,mvLogisticContracts.[mvLogisticAddress_id_StartOriginal]       ,mvLogisticContractLineItems.[LineItemQuantity]     ,mvLogisticContractLineItems.[LineItemNo]
              ,mvLogisticContractLineItems.[OffloadRequired]
              ,mvLogisticContractLineItems.[CarrierRequested]     ,mvLogisticContractLineItems.[RequestedSerialNo]
              ,mvLogisticContractLineItems.[mvLogisticAddress_id_Start]
              ,mvLogisticContractLineItems.[DateReadyBy]      ,mvLogisticContractLineItems.[DateCustomerRequested],
              mvLogisticContractLineItems.[mvLogisticContractLineItem_id]
          ,mvLogisticSalespersons.[FirstName],mvLogisticSalespersons.[LastName],
          mvLogisticServiceTypes.ServiceType,
          mvLogisticStatuses.Description, 
          mvLogisticSKUs.[SKU],       
          mvLogisticSKUs.[Length], 
          a1.CITY, a1.AddressName, a1.Address1, a1.STATE, a1.ZIP, 
          a2.CITY, a2.AddressName, a2.Address1, a2.STATE, a2.ZIP, 
          mvLogisticContracts.[mvLogisticAddress_id_Start], 
          mvLogisticContracts.[mvLogisticAddress_id_End]
    
  • Tool
    Tool over 11 years
    Edit: I think I've found what I'm looking for (by using subqueries which you mentioned); pastebin.com/LjWyCgak. Thanks alot.
  • Tool
    Tool over 11 years
    I think I solved it by changing the query to this: pastebin.com/m2ArmxGc
  • Fred Sobotka
    Fred Sobotka over 11 years
    I've added some examples that use table expressions, which enable you to retrieve multiple columns at one time.
  • Mike Parkhill
    Mike Parkhill over 11 years
    If this answered your question you should mark it as "Accepted" so others can learn from it too.