Create range bins from SQL Server table for histograms
Solution 1
Simply add the Gender
column to your SELECT
and then do a GROUP BY
.
SELECT
Gender,
count(CASE WHEN Age>= 10 AND Age < 20 THEN 1 END) AS [10 - 20],
count(CASE WHEN Age>= 21 AND Age < 30 THEN 1 END) AS [21 - 30],
count(CASE WHEN Age>= 31 AND Age < 35 THEN 1 END) AS [31 - 35],
count(CASE WHEN Age>= 36 AND Age < 40 THEN 1 END) AS [36 - 40]
FROM Attendees AS AgeGroups
GROUP BY Gender
Solution 2
I recently came across a similar problem where I needed to look at several variables rather than just one, and my solution was to use a temporary table.
CREATE TABLE #bin (
startRange int,
endRange int,
agelabel varchar(10)
);
GO
INSERT INTO #bin (startRange, endRange, mylabel) VALUES (10, 20, '10-20')
INSERT INTO #bin (startRange, endRange, mylabel) VALUES (21, 30, '21-30')
INSERT INTO #bin (startRange, endRange, mylabel) VALUES (31, 35, '31-35')
INSERT INTO #bin (startRange, endRange, mylabel) VALUES (36, 40, '36-40')
GO
SELECT
b.agelabel as ageBracket,
a.Gender,
count(a.Gender) as total
FROM
Attendees a
INNER JOIN
#bin b on (a.Age >= b.startRange and a.Age <= b.EndRange)
GROUP BY
b.agelabel, a.Gender
DROP TABLE #bin
GO
Or Alternatively, and probably the better solution,
With table1 as
(
SELECT
CASE
WHEN Age >= 10 and Age <= 20 then '10-20'
WHEN Age > 20 and Age <= 30 then '21-30'
WHEN Age > 30 and Age <= 35 then '31-35'
WHEN Age > 35 and Age <= 40 then '36-40'
ELSE 'NA'
End as ageBracket,
Gender
FROM
Attendees
)
SELECT
ageBracket,
Gender,
Count(Gender),
FROM
table1
GROUP BY
ageBracket, Gender
Where the result would be :
AgeBracket Gender Total
10-20 M 0
10-20 F 0
21-30 M 12
21-30 F 10
31-35 M 9
31-35 F 12
36-40 M 6
36-40 F 7
You can use the first select statement to gather all the data of your choice, while using the second query to perform any necessary calculations.
I think these solutions might be a bit overkill for your problem, but as it was the only question I found concerning binning, hopefully it will be useful to others !
Solution 3
The example above excludes values 20, 30, 35, & 40. There is more than one way to skin this cat, here is one:
SELECT
Gender,
count(CASE WHEN Age> 9 AND Age <= 20 THEN 1 END) AS [10 - 20],
count(CASE WHEN Age> 20 AND Age <= 30 THEN 1 END) AS [21 - 30],
count(CASE WHEN Age> 30 AND Age <= 35 THEN 1 END) AS [31 - 35],
count(CASE WHEN Age> 35 AND Age <= 40 THEN 1 END) AS [36 - 40]
FROM Attendees AS AgeGroups
GROUP BY Gender
Related videos on Youtube
user10901
Updated on April 17, 2020Comments
-
user10901 about 4 years
I have the following table in SQL Server:
----------------------------- ID Age Gender 1 30 F 2 35 M 3 32 M 4 18 F 5 21 F
What I need to do is to execute a query that will group the records in given ranges and count the occurences. The results need to be displayed later in a histogram chart (bar chart). I tried a query similar to the following:
SELECT count(CASE WHEN Age>= 10 AND Age < 20 THEN 1 END) AS '10 - 20', count(CASE WHEN Age>= 21 AND Age < 30 THEN 1 END) AS '21 - 30', count(CASE WHEN Age>= 31 AND Age < 35 THEN 1 END) AS '31 - 35', count(CASE WHEN Age>= 36 AND Age < 40 THEN 1 END) AS '36 - 40', FROM (SELECT Age FROM Attendees) AS AgeGroups
For the moment, this does the trick, but does not consider the gender column. It will result a single row which counts the frequency of every age group:
10-20 21-30 31-35 36-40 0 22 21 13
If gender would be considered there should be two records shown, for each gender. I need to see something like:
Gender 10-20 21-30 31-35 36-40 M 0 12 9 6 F 0 10 12 7
How should I approach this problem?
-
Pieter Geerkens over 10 yearsI demonstrated SQL code to generate a histogram directly here: stackoverflow.com/questions/16268441/…
-
-
Todd Meinershagen about 8 yearsI believe that you have to use the sum() function instead of the count() function. Otherwise, each range will equal the total number of records in the Attendees table. (btw - I have made this same mistake multiple times.)
-
AS91 over 7 years@ToddMeinershagen I'm guessing his table has a unique attendee per row as his sample shows, in which case
COUNT
andSUM
would work the same sinceSUM
does not get a value other than 1 for each row