Best Way To Calculate A Median In Access 2007 When Using Group By

sql ms-access ms-access-2007 median

460

Solution 1

There is no Median in Jet SQL, unless it has been added for 2007, but here is an idea of how to get one. You will need ...

Some SQL ...

SELECT Statistics.Month, Sum(([SentTo])) AS [Sum Sent], fMedian("Statistics","Month",[Month],"SentTo") AS [Median Sent]
FROM Statistics
GROUP BY Statistics.Month;

And a User Defined Function (UDF).

Function fMedian(SQLOrTable, GroupFieldName, GroupFieldValue, MedianFieldName)
Dim rs As DAO.Recordset

Set db = CurrentDb
Set rs1 = db.OpenRecordset(SQLOrTable, dbOpenDynaset)

If IsDate(GroupFieldValue) Then
    GroupFieldValue = "#" & GroupFieldValue & "#"
ElseIf Not IsNumeric(GroupFieldValue) Then
    GroupFieldValue = "'" & Replace(GroupFieldValue, "'", "''") & "'"
End If

rs1.Filter = GroupFieldName & "=" & GroupFieldValue
rs1.Sort = MedianFieldName

Set rs = rs1.OpenRecordset()
rs.Move (rs.RecordCount / 2)

If rs.RecordCount Mod 2 = 0 Then
    varMedian1 = rs.Fields(MedianFieldName)
    rs.MoveNext
    fMedian = (varMedian1 + rs.Fields(MedianFieldName)) / 2
Else
    fMedian = rs.Fields(MedianFieldName)
End If

End Function

From: LessThanDot Wiki

Solution 2

Median can be be calculated in MS Access using just regular queries and without VBA. Median is the 50th percentile. So, create select queries normally; then go into SQL view and include "Top 50 Percent" after select keyword. Sort ascending for bottom 50 percent; sort descending for top 50 percent. Then find maximum of bottom percent result set and minimum of top percent result set. The average of these two is the median. When using "Top 50 Percent", make sure criteria in the query is specific for that result set from which median will be calculated.

460

Author by

A H

Updated on June 06, 2022

Comments

A H almost 2 years

On my GCP VM (ubuntu) server, if I run:

gcloud init

in the terminal, and log in as my username (NOT A SERVICE ACCOUNT), e.g. [email protected], then I can be authenticated as my user self, and am authenticated for all gcp related python functions.

>>> import google.auth
>>> credentials, project = google.auth.default()
UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)

However, when I follow the same process on my local windows machine, I get this error:

>>> import google.auth
>>> credentials, project = google.auth.default()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\lib\site-packages\google\auth\_default.py", line 354, in default
    raise exceptions.DefaultCredentialsError(_HELP_MESSAGE)
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the
 application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

I do not want to use a service account. I want to be authenticated using end user credentials (Without using flow).

Shawn H almost 14 years

Hi, first off this was very helpful as a starting point, thank you. The UDF has a small error. After "rs.Move (rs.RecordCount / 2)" you will need to add the line "rs.movePrevious". This will give you the correct Median as Excel would calculate it.
Henrik Erlandsson over 11 years

A very nice idea :) But can you be sure the COUNT of the Top/Bottom 50% is rounded up when the number of rows is odd? If not, the ideal median - the value exactly in the middle - would be skipped.
tobriand over 10 years

A couple of tests seem to suggest that at least in most cases, it does round up. That said, it's still very slow on large datasets...