How to create a histogram using MySQL
I think we can use the following general form to create a general histogram:
select (x div 4) * 4 as NewX, count(*) as NewY from histogram
group by NewX
Where x
is the real x value of the x axis and count(*)
is the real y value. The number 4
is the size amount of the x values we want to group. This means we will group all x values in groups of 4 (e.g.: group 1 is 0, 1, 2, 3; group 2 is 4, 5, 6, 7, and so on). The count of each item in the group will become the NewY value
You can play with this here
Applying this logic to your query this would be:
select (floor(Max_Irrad/10) div 4) * 4 as NewX, count(*) as NewY
from marctest.test_summarynimish
where Lcu_name='Allegro' and Lcu_Mode='Standard'
group by NewX
Let me know if you have any trouble or doubt about this.
dna
Updated on June 09, 2022Comments
-
dna about 2 years
I am trying to create a histogram data using following query:
SELECT FLOOR(Max_Irrad/10) AS bucket, COUNT(*) AS COUNT FROM marctest.test_summarynimish where Lcu_name='Allegro' and Lcu_Mode='Standard' GROUP BY bucket;
following is the result that i am getting:
bucket count 0 3 4 3 5 12 7 6 8 3 10 3
now the bucket field is the range or bin used in the histogram. I want to create a bucket values with consistent range, for eg starting from 0,4,8,12.... and so on.. Is there any way to achieve this in mysql? This is how I am expecting to have as result:
bucket count 0 3 4 21 8 6
-
dna over 12 yearsThe solution you provide is working thanks a lot.... :) that would be really very helpful
-
LionHeart over 11 yearsAbove method is correct except it will not report buckets having zero samples.
-
juacala over 10 yearsAlso, if you're bin size is a non-integer it won't work. You'll need instead floor(value/binSize)*binSize for non-integer bin sizes.
-
Mosty Mostacho over 10 years@LionHeart That makes sense. If there are no rows that contain data then it is not possible to display those missing rows, right? :) If you want to know how to generate data in MySQL I've written a long explanation with a couple of alternatives in this other question. Anyway, the missing data can be easily and more efficiently zero-filled when formatting the results in the histogram.