Calculating multiple column average in SQLite3

10,782

Solution 1

In standard SQL

SELECT 
(SUM(q7)+SUM(q33)+SUM(q38)+SUM(q40)+..+SUM(q119))/
(COUNT(q7)+COUNT(q33)+COUNT(q38)+COUNT(q40)+..+COUNT(q119)) AS domain_score1 
FROM survey

would give you what you want SUM will coalesce to 0 if null and COUNT will not count NULLs. (hope SQLite3 conforms).

EDIT: Checked the http://www.sqlite.org/lang_aggfunc.html and SQLite conforms; if sum() is going to overflow you could use total() instead.

Also I second the opinions re normalization, if you don't normalize your table design (and whenever you see columns with numbers in their names raise a red flag) you are not going to have elegant SQL.

Solution 2

AVG already ignores nulls and does what you want:

The avg() function returns the average value of all non-NULL X within a group. String and BLOB values that do not look like numbers are interpreted as 0. The result of avg() is always a floating point value as long as at there is at least one non-NULL input even if all inputs are integers. The result of avg() is NULL if and only if there are no non-NULL inputs.

From http://www.sqlite.org/lang_aggfunc.html

So you can probably take the values you have per domain and load them into another table and then just run averages on that table. Or you could just unpivot your wide table and run averages on that, too.


AVG works on columns, not rows. So if you unpivoted your table you could use AVG and not have the problem you're facing. Let's look at a small example:

You have a table and it looks like this:

ID  | q1  | q2  | q3
----------------------
1   | 1   | 2   | NULL
2   | NULL| 2   | 56

You want to average q1 and q2 together because they're in the same domain, but they're separate columns so you can't. But if you changed your table to look like this:

ID  | question | value
-----------------------
1   | 1        | 1
1   | 2        | 2
1   | 3        | NULL
2   | 1        | NULL
2   | 2        | 2
2   | 3        | 56

Then you could take the average of the two questions easily:

SELECT AVG(value)
FROM Table
WHERE question IN (1,2)

And you can group by ID if you want an average per ID rather than a global average:

SELECT ID, AVG(value)
FROM Table
WHERE question IN (1,2)
GROUP BY ID

Solution 3

This is going to be a monstrous query, but you could do this:

SELECT AVG(q) FROM
((SELECT q7 AS q FROM survey) UNION ALL
(SELECT q33 FROM survey) UNION ALL
(SELECT q38 FROM survey) UNION ALL
...
(SELECT q119 FROM survey))

This converts your columns to rows and uses the AVG() function.

Of course, you probably want this for just a particular survey record, so don't forget the WHERE clause:

SELECT AVG(q) FROM
((SELECT q7 AS q FROM survey WHERE survey_id = 1) UNION ALL
(SELECT q33 FROM survey WHERE survey_id = 1) UNION ALL
(SELECT q38 FROM survey WHERE survey_id = 1) UNION ALL
...
(SELECT q119 FROM survey WHERE survey_id = 1))

You'd have a lot easier time if you normalized the q columns into their own table, with one question per row, and references back to survey. You'd have a 1 to many relationship between survey and question.

Solution 4

Use a separate table to store survey scores for different questions (assuming that q's are because of question). Something like following

SurveyTable(SurveyId, ...)
SurveyRatings(SurveyId, QuestionId, Rating)

After that you can run query like

SELECT avg(Rating) WHERE SurveyId=?
Share:
10,782
Benjamin Oakes
Author by

Benjamin Oakes

Ruby, JavaScript, and Web. If I made a contribution that helped you, please consider Flattr logo http://api.flattr.com/button/flattr-badge-small.png leaving a tip via Flattr. Maid I maintain the open source Maid project: Be lazy. Let Maid clean up after you, based on rules you define. It's easy to install on a Mac or Linux. See instructions at http://rubygems.org/gems/maid TabCarousel I maintain an open source Chrome extension to help you keep tabs on info you want to monitor. Great for a TV or other external display. Install it from the Chrome Web Store.

Updated on June 28, 2022

Comments

  • Benjamin Oakes
    Benjamin Oakes almost 2 years

    I need to average some values in a row-wise fashion, rather than a column-wise fashion. (If I were doing a column-wise average, I could just use avg()). My specific application of this requires me ignore NULLs in averaging. It's pretty straightforward logic, but seems awfully difficult to do in SQL. Is there an elegant way of doing my calculation?

    I'm using SQLite3, for what it's worth.

    Details

    If you need more details, here's an illustration:

    I have a a table with a survey:

    | q1 | q2    | q3    | ... | q144 |
    |----|-------|-------|-----|------|
    | 1  | 3     | 7     | ... | 2    |
    | 4  | 2     | NULL  | ... | 1    |
    | 5  | NULL  | 2     | ... | 3    |
    

    (Those are just some example values and simple column names. The valid values are 1 through 7 and NULL.)

    I need to calculate some averages like so:

    q7 + q33 + q38 + q40 + ... + q119 / 11 as domain_score_1
    q10 + q11 + q34 + q35 + ... + q140 / 13 as domain_score_2
    ...
    q2 + q5 + q13 + q25 + ... + q122 / 12 as domain_score_14
    

    ...but i need to pull out the nulls and average based on the non-nulls. So, for domain_score_1 (which has 11 items), I would need to do:

    Input:  3, 5, NULL, 7, 2, NULL, 3, 1, 5, NULL, 1
    
    (3 + 5 + 7 + 2 + 3 + 1 + 5 + 1) / (11 - 3)
    27 / 8
    3.375
    

    A simple algorithm I'm considering is:

    Input:

    3, 5, NULL, 7, 2, NULL, 3, 1, 5, NULL, 1 
    

    Coalesce each value to 0 if NULL:

    3, 5, 0, 7, 2, 0, 3, 1, 5, 0, 1
    

    Sum:

    27
    

    Get the number of non-zeros by converting values > 0 to 1 and sum:

    3, 5, 0, 7, 2, 0, 3, 1, 5, 0, 1
    1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1
    8
    

    Divide those two numbers

    27 / 8
    3.375
    

    But that seems like a lot more programming than this should take. Is there an elegant way of doing this that I'm not aware of?

    Update:

    Unless I'm misunderstanding something, avg() won't work for this. Example of what I would want to do:

    select avg(q7, q33, q38, ..., q119) from survey;
    

    Output:

    SQL error near line 3: wrong number of arguments to function avg()