Averages, Probability, and Reality

(Archive Question of the Week)

Recently I discussed the definition of the median of a data set, pointing out how it needs refinements that are not often discussed. In searching for questions in our archive on that topic, I ran across a discussion of an opposite issue: the breadth of the general term “average”, which does not have a specific definition. This is a nice example of how a seemingly simple question can lead to a discussion of wide-ranging topics that go off in different directions and end up tying many ideas together. That is part of the fun of being a Math Doctor.

What is an average?

Here is the question, from Danny in 2007:

What is the Meaning of "Average"?

Can you please give a detailed description of average and its meaning? I'm not looking for a definition like "average is a certain # divided by a certain total #."  I don't quite understand the real meaning of average.

A simple enough question, but one that could – and did – go in several directions, some of which were not initially apparent. In the end, we went beyond the definition of “average” in the sense of “arithmetic mean” as he defines it here, into other averages, and then into probability and beyond.

I started by discussing the various statistics that are sometime called “averages”, and then answered his specific question about the deeper meaning of the mean:

There are several different meanings of "average".  The most general is a "measure of central tendency", meaning any statistic that in some sense represents a typical value from a data set.  The mean, median, and mode are often identified as "averages" in this sense.

The word "average" is also used (especially at elementary levels) to refer specifically to the mean, which is the kind of average you mentioned: add the numbers and divide by how many there are.  This kind of average has a specific meaning: it is the number you could use in place of each of the values, and still have the same sum.

This is the sort of “average” that Danny initially referred to. I gave a brief example of what my definition meant, with links to fuller explanations of this sense of “average”, and of other sorts of “mean”:

What Does Average Mean?

Arithmetic vs. Geometric Mean

Applications of Arithmetic, Geometric, Harmonic, and Quadratic Means


Probability as an average

My comments triggered a new question, about the meaning of “central tendency”. This is a complex term that is easily misunderstood. He did something wise here: he showed me what he was thinking, so I could correct it, rather than just ask another question without explanation. This is a part of good communication, and leads to profitable discussions.

Thanks for the helpful response.  In your letter, you mentioned that average refers to central tendency.  Let me give my interpretation of what central tendency means.  Please correct me if I am wrong.  For example, if our data shows that it rains 10 times over 100 days, then it means that the sky "tends" to rain 10 times per 100 days.  10 divided 100 gives a frequency value of 0.1, which means that it rains 0.1 time per day on average.  This average refers to how frequently it rains.  For example, if it rained 11 times (more frequent than 10), then you would get 11/100, which is a bigger value than 10/100.  Thus, 11/100 is more frequent than 10/100.  Is this interpretation of central tendency correct?

I also think central tendency is the average value that tends to be close to MOST of the various values in the data.  For instance, if my data set is (4,6,1,3,0,5,3,4) the central tendency is 3.25, which is a value that tends towards 3 and 4.  There are two 3 values and two 4 values in the data, which make up most of the data set.

The word “tendency” had led Danny off in a new direction, verging on probability, which is not really what we mean by the term here; but it is connected to our topic (“it tends to rain 1 of 10 days, on average“), so this was not a huge stretch. But he seems to have missed the word “central”, which is the real key to the phrase.

There are several other little twists in Danny’s understanding here: Can it really rain 0.1 time in a day? (I didn’t use number of rainfalls, but inches of rain, in my examples. But we’ll get back to this question.) And does “central tendency” imply closeness to most of the data (which sounds more like the mode in particular)? I now had to dig deeper into what it does mean; my word “typical” above did not do much to clarify. Words are slippery, aren’t they?

What you are saying in both cases is a reasonable example of the mean, and fits with my description of average rainfall, though I used the inches of rain per day rather than the number of rainfalls.

But central tendency is intended to be a much broader term. It's meant to be vague, because it covers not only means but also the median, the midrange, and even the mode. Its meaning is "any statistic that tends to fall in the middle of a set of numbers"; anything that gives a sense of what the "usual" or "typical" value is, in some sense, can be called a measure of central tendency. The *median* is, literally, the number in the middle--put the numbers in order, and take the middle number in the list, or the average of the two middle numbers if necessary. So that's clearly a "central tendency". The *midrange* is the exact middle of the range--the average, in fact, of the highest and lowest numbers. So that, too, has to lie in the middle, though it doesn't take into account how the rest of the numbers are distributed. The *mode* is the most common value, if there is one; it really doesn't have to be "in the middle", or even to exist, but it certainly fits the idea of "typical". The (arithmetic) *mean*, like all the others, has to lie within the range of the numbers, and it represents the "center of gravity" of all the numbers. So each of these fits the meaning of "measure of central tendency", each in a different way.

Now Danny turned a corner and moved fully into the topic of probability:

Hello doctor, thanks for your insights, I now have a better idea of average.  Here is one more question about probability.  Let's say that I was sick 40 times out of 1000 days.  So based on this information, the probability of me getting sick on a random day is 40/1000. ...  

This leads me to conclude that the probability of anything is based on the past data, and we can make good predictions of future events because of the law of continuity, meaning that things in the universe always follow a pattern.  If we lived in a universe without continuity, then the knowledge of probability is useless.

So if I was sick 40 times out of 1000 days in the past, then the probability of me getting sick on a random day is the average value of 40/1000 = 0.04.  I like to point out that the average 0.04 doesn't have a real physical meaning, because it says that I was sick on average 0.04 times per day (0.04 times? that makes no sense).  I think 0.04 is just a number that corresponds to or represents 40/1000 (40 times per 1000 days is meaningful).

So he wants confirmation of his concept of probability, which I gave:

What you're talking about here is called empirical probability: just a description of what actually happened, which can't say anything about why, or what could happen another time.  It's simply a ratio: how does the number of occurrences of sickness compare to the number of days under consideration?  Out of those 1000 days, 40 of them were sick days; so "on the average" 40 out of 1000, or 4 out of 100, or 1 out of 25 were sick days.  If they were evenly distributed--the same idea as a mean--then every 25th day would have been a sick day.

I chose not to point out the difference between his wording (the number of times he got sick), and mine (the number of sick days); we often leave such details to be absorbed from what we say rather than deliberately confront them; but we try to give an example of careful wording.

Probability and the real world

But does the concept of probability imply, or require, that the universe must follow predictable patterns? That raises some further questions about its validity, which I answered by dipping into the philosophical realm:

Now you've made some big jumps!  Not ALL of probability is just about past data; that's just empirical probability.  And we can't always extrapolate from past events to the future.  Sometimes that works, sometimes it doesn't.  In part, it's the job of statistics to look at the data you've got and determine how valid it is to expect the same probabilities to continue--how good a sample you have.  But even beyond that, whether we can assume that patterns will continue depends on other knowledge entirely, such as science.  If we find a mechanism that explains a pattern, we have much better grounds for expecting it to continue than if we don't.

To make a broad statement that "things in the universe ALWAYS follow a pattern" is to indulge in philosophy, not math.  In probability, we go the other way: we make an ASSUMPTION that things will continue as they are, in order to be able to apply probability to predicting anything; we leave it up to scientists (or sometimes philosophers) to decide whether that is a valid assumption.  The scientist will most likely do some experiments to see if the predictions based on his theory work out, and if so he has some evidence that it is valid, and he can continue to make predictions.  If not, then he tries another theory! He certainly would not say that probability forces him to believe that things work a certain way.

And perhaps that's what you mean to say: probability applies to a situation beyond the data we have only if there is consistency in the causes underlying the phenomena.

As we have said many times, math is not necessarily about the real world; it can be used to model the world based on observations of it, but the results must always be checked. Probability assumes a consistent world, saying, “If things continue the same, this is what we can expect.”

After pointing out the relationship of his comment about 0.04 times per day and the Law of Large Numbers, I returned to the connection of these ideas to averages:

The difference between this and the general idea of averages is that an average can apply to any collection of numbers, not just to the frequency of an occurrence.  We can talk about the average speed of a car; regardless of how its speed has varied along a route, we can use the total distance traveled and the total time it took to determine the average speed, which is the speed it might have been going throughout the entire trip, in order to get the same total distance in the same total time.  There is nothing probabilistic about this; but like probability, we are taking something that may vary "randomly" and condensing all its variations into a single number.  The average speed does not mean that at every moment the car was going that fast, and the probability does not mean that out of every 25 days you are sick on one of them, or, worse, that on every day you are sick for 1/25 of the time.  Averages and probability both ignore unevenness and look only at the big picture.

And that makes your question a very good one.  I've been noticing the connections between probability and averages in several areas lately, and it's good to have a chance to think more about it.

What is an average, really?

Finally, Danny got back to the topic of averages, with an excellent long question about things like this:

It seems like sometimes averages have no meaning.  For instance, in a class of 10 students, 2 got 100 on a test, 8 got 0.  The test average is 200/10 = 20.  So on average every person got a 20 on the test.  If I am correct in thinking that an average value is an estimate of the various values in the same data set (like you said, an average is like a center of gravity in the data set, so all the numbers in the data set should lean towards the average), then the average 20 is closer to the REAL scores of the 8 students who got 0 than to the REAL score of the 2 people who got 100.
This average gives a vague idea of how badly most people did, but it has "hidden" the two perfect scores.  The average may tell us that most of the people must have done badly so that the average comes out to be so low.  However, we can't know that some people did perfectly just by looking at the average.

Note that he very nicely paraphrased “central tendency” as “all the numbers in the data set should lean towards the average” – I think he got it!

Here are some excerpts of my discussion of this new topic:

Several of the pages on our site that discuss mean, median, and mode talk about why you would choose one rather than another.  Each has its uses, and what you're saying is that for some purposes the mean is not the appropriate "measure of central tendency".  That doesn't mean that it is meaningless, or that it is never a valid concept; only that it doesn't tell you what you'd like to know in this situation.
Another classic example of this is median income.  If in your town 999 people earned $1000 a year, and one man earned $9,000,000 a year, the average (mean) income would be 10,000 a year, even though NOBODY made that amount.  [Oops - I meant 9,999.]  The median income gives a much better picture, if you want to know how the "average" person is doing; but that entirely misses the fact that there is one person who is rich.  No matter what "average" you use, you'll be leaving someone out.

Another example is the rainfall I like to use to illustrate the idea of the mean.  If the average rainfall is 1 inch a day, say, it might actually have been dry as a bone for 99 days, and then there was a 100 inch flood on the last day.  The average accurately reflects the TOTAL amount of rain over the 100 days, but that isn't all it takes to decide what plants can survive there.

Again, the whole idea of an average is to try to boil down a lot of information into one number.  That necessarily means that you have to lose some information.  (That's why people don't want to be treated as mere numbers; they are more complex than that.  Even a set of numbers doesn't like to be replaced by a single number!)
Incidentally, I've sometimes noticed in teaching, as a result of these statistics, that I can't "teach to the middle" of the class, because there is no middle.  Sometimes I find a bimodal distribution, which means that I have a lot of F's and a lot of B's, and no one in between where the median and the mean both lie.  (The last word there is an interesting, and very appropriate, pun!) So I have to ignore the statistics and teach the students.

As always, I’ve left out a lot, so you’ll have to read the original if you are interested. Some of these long discussions bring out a lot that is worth pondering.

Leave a Comment

Your email address will not be published. Required fields are marked *