Even though SQL wasn't designed for calculating statistics, it does include a number of basic statistical functions. In this tip, I'll define statistical terms and functions and then demonstrate how you can benefit from using SQL's built-in statistical functions.

Here are some basic statistics terms (which, if you're not very familiar with statistics, will come in handy when I define statistical functions below):

  • Population: the total number of values of interest.
  • Sample: When working with large populations, it's often impractical to use the whole population; therefore, statisticians use a sample extracted from the entire population. The sample must be large enough to accurately represent the population.
  • Mean: the average, which is calculated by dividing the values' sum by the values' count.
  • Mode: the value that occurs most frequently in the set that's being analysed. One problem with the mean is that none of the values may match it exactly.
  • Median: the value that is halfway between the lowest and highest values in the set; that is, the value with an equal number of values above and below it.
  • Standard deviation: measures the scatter of values with respect to the mean. When the standard deviation is small, the values lie very close to the mean; the larger it gets, the more values you'll find that are far away from the mean. To put it another way, with a large deviation the bell curve is low and wide; with a small deviation the bell curve is high and narrow.
  • Variance: the square of the standard deviation.

Now, here's a list of statistical functions (as well as some standard arithmetical functions that are useful in calculating statistics) and their meanings.

  • AVG(): returns the mean
  • COUNT(): returns the population (or sample, depending on the row source)
  • SQUARE(n): returns the square of the value specified
  • POWER(a,n): returns the value of a to the nth power
  • SQRT(n): returns the square root of n
  • SUM(): returns the sum of the values in a set
  • STDDEV(): returns the standard deviation of a sample
  • STDDEVP(): returns the standard deviation of a population
  • VAR(): returns the variance of a sample
  • VARP(): returns the variance of a population

Notice that there aren't any built-in functions for calculating mode and median. Nevertheless, you can derive these values with a little effort.

Using the Northwind sample database, let's suppose that we want to determine the mode of quantities ordered. Here's how we do it:

SELECT TOP 1 quantity, COUNT(*) Count
FROM [order details]
GROUP BY quantity
ORDER BY Count DESC
Calculating the median isn't much more difficult (although there is a wrinkle). The basic strategy is to join the table to itself on the column of interest, using one table to count the rows that are less than the given value, and using the other table to count the rows that are greater than that value. This is the query that calculates the median:
SELECT a.quantity median
FROM [order details] a, [order details] b
GROUP BY a.quantity
HAVING
SUM(CASE WHEN b.quantity <= a.quantity
THEN 1 ELSE 0 END) >=(COUNT(*))/2
AND
SUM(CASE WHEN b.quantity >= a.quantity
THEN 1 ELSE 0 END)>=(COUNT(*)/2)

Here's the wrinkle: The code above assumes that the number of rows is odd; therefore, there is guaranteed to be a middle row. When the number of rows is even, you have a choice of which row to return. In my experience, most statisticians return the lower of the two middle values.

I'm not recommending that you write an entire statistics package using SQL. But, if your needs are relatively simple, you can use the built-in functions, build upon them, and derive the basic statistical values. With more work and careful planning, you can take this even further.

Arthur Fuller has been developing database applications for more than 20 years. He frequently works with Access ADPs, Microsoft SQL 2000, MySQL, and .NET

Do you need help with SQL Server? Gain advice from Builder AU forums

Related links

Comments

1

Toni van Heerden - 03/11/04

The query you have posted on the website to calculate the median does not run on the Northwind sample database. Please will you provide more details on how to calculate the median

» Report offensive content

2

Vivek Prakash - 30/08/06

Can you tell what will the query if we want to calculate the product?

» Report offensive content

3

Vince - 18/05/07

I got question about median. Here is a exercise i got from a book.
query Q1 obtains median m1 of a set S1 values.
query Q2 obtains median m2 of subset S2 of S1.
if m1 < m2, what can be inferred about S1, S2, and the elements of S1 not in S2 ?
I guess I am still confused with how median works.

» Report offensive content

Leave a comment

You must read and type the 6 chars within 0..9 and A..F

* indicates mandatory fields.

3

Vince - 18/05/07

I got question about median. Here is a exercise i got from a book. query Q1 obtains median m1 of a set ... more

2

Vivek Prakash - 30/08/06

Can you tell what will the query if we want to calculate the product? ... more

1

Toni van Heerden - 11/03/04

The query you have posted on the website to calculate the median does not run on the Northwind sample database. Please ... more

Log in


Sign up | Forgot your password?

  • Staff XP stays on life support for longer

    This week's Roundup looks at Microsoft's decision to extend the life of Windows XP, the release of Microsoft Surface SDK, Firefox's new Geode plug-in, Yahoo's new tool -- Smush It and more. Read more »

    -- posted by Staff

  • Chris Duckett The good and truly awful celluloid depictions of computers

    Ever wonder why your lawyer uncle leaves the room whenever you turn over to Boston Legal? Or why your forensic science cousin can't stand crime drama? You know the answer: it’s the horrid trivialisation and dumbing down of an occupation to make it appear entertaining. Sometimes it is so unbelievable that it actually hurts and yelling at the screen is the only outlet. Read more »

    -- posted by Chris Duckett

  • Brendon Chase Apple's iPhone engineers to tour Sydney, Melbourne

    Aussie developers will be able to get up close and personal with some of the iPhone engineers in November to learn how to build applications for the platform. Read more »

    -- posted by Brendon Chase

What's on?