Generating a histogram from column values in a database

52,575

Solution 1

SELECT COUNT(grade) FROM table GROUP BY grade ORDER BY grade

Haven't verified it, but it should work.It will not, however, show count for 6s grade, since it's not present in the table at all...

Solution 2

If there are a lot of data points, you can also group ranges together like this:

SELECT FLOOR(grade/5.00)*5 As Grade, 
       COUNT(*) AS [Grade Count]
FROM TableName
GROUP BY FLOOR(Grade/5.00)*5
ORDER BY 1

Additionally, if you wanted to label the full range, you can get the floor and ceiling ahead of time with a CTE.

With GradeRanges As (
  SELECT FLOOR(Score/5.00)*5     As GradeFloor, 
         FLOOR(Score/5.00)*5 + 4 As GradeCeiling
  FROM TableName
)
SELECT GradeFloor,
       CONCAT(GradeFloor, ' to ', GradeCeiling) AS GradeRange,
       COUNT(*) AS [Grade Count]
FROM GradeRanges
GROUP BY GradeFloor, CONCAT(GradeFloor, ' to ', GradeCeiling)
ORDER BY GradeFloor

Note: In some SQL engines, you can GROUP BY an Ordinal Column Index, but with MS SQL, if you want it in the SELECT statement, you're going to need to group by it also, hence copying the Range into the Group Expression as well.

Option 2: You could use case statements to selectively count values into arbitrary bins and then unpivot them to get a row by row count of included values

Solution 3

Use a temp table to get your missing values:

CREATE TABLE #tmp(num int)
DECLARE @num int
SET @num = 0
WHILE @num < 10
BEGIN
  INSERT #tmp @num
  SET @num = @num + 1
END


SELECT t.num as [Grade], count(g.Grade) FROM gradeTable g
RIGHT JOIN #tmp t on g.Grade = t.num
GROUP by t.num
ORDER BY 1

Solution 4

According to Shlomo Priymak's article How to Quickly Create a Histogram in MySQL, you can use the following query:

SELECT grade, 
       COUNT(*) AS 'Count',
       RPAD('', COUNT(*), '*') AS 'Bar' 
FROM grades 
GROUP BY grade

Which will produce the following table:

grade   Count   Bar
1       2       **
2       1       *
3       1       *
4       1       *
5       1       *

Solution 5

Gamecat's use of DISTINCT seems a little odd to me, will have to try it out when I'm back in the office...

The way I would do it is similar though...

SELECT
    [table].grade        AS [grade],
    COUNT(*)             AS [occurances]
FROM
    [table]
GROUP BY
    [table].grade
ORDER BY
    [table].grade

To overcome the lack of data where there are 0 occurances, you can LEFT JOIN on to a table containing all valid grades. The COUNT(*) will count NULLS, but COUNT(grade) won't count the NULLS.

DECLARE @grades TABLE (
   val INT
   )  

INSERT INTO @grades VALUES (1)  
INSERT INTO @grades VALUES (2)  
INSERT INTO @grades VALUES (3)  
INSERT INTO @grades VALUES (4)  
INSERT INTO @grades VALUES (5)  
INSERT INTO @grades VALUES (6)  

SELECT
    [grades].val         AS [grade],
    COUNT([table].grade) AS [occurances]
FROM
    @grades   AS [grades]
LEFT JOIN
    [table]
        ON [table].grade = [grades].val
GROUP BY
    [grades].val
ORDER BY
    [grades].val
Share:
52,575
Thorsten79
Author by

Thorsten79

I'm a German software developer with a knack for the real-world but a love for theoretical rambling. I have a M.Sc. equivalent (Diplom) in computer science and speech processing. My programming experience ranges from developing an NNTP newsreader in the nineties in AmigaE, a Gameboy Advance game in pure Assembler 10 years ago (I also painted the graphics ;) to my current work which is about migrating a large business application to C# and WPF. As a good work environment I prefer a nice and competent team, challenging projects and the possibility to wear headphones when I need them. I wear suits on occasion and feel comfortable presenting my work to anybody when requested. My avatar shows a painting by Robert Delaunay.

Updated on November 12, 2020

Comments

  • Thorsten79
    Thorsten79 over 3 years

    Let's say I have a database column 'grade' like this:

    |grade|
    |    1|
    |    2|
    |    1|
    |    3|
    |    4|
    |    5|
    

    Is there a non-trivial way in SQL to generate a histogram like this?

    |2,1,1,1,1,0|
    

    where 2 means the grade 1 occurs twice, the 1s mean grades {2..5} occur once and 0 means grade 6 does not occur at all.

    I don't mind if the histogram is one row per count.

    If that matters, the database is SQL Server accessed by a perl CGI through unixODBC/FreeTDS.

    EDIT: Thanks for your quick replies! It is okay if non-existing values (like grade 6 in the example above) do not occur as long as I can make out which histogram value belongs to which grade.