Students commonly expect that textbooks all say the same thing (in fact, some think they can ask us about “Theorem 6.2” and we’ll know what they’re talking about!). The reality is that they can even give conflicting definitions, depending on the perspective from which they approach a topic. Here, I want to show how and why they differ in talking about intervals on which a function is increasing or decreasing. Let’s see if we can resolve the “fight”.

This one page in our archive actually contains two of the three answers to the question in our archive, the second being a challenge to the first. Let’s start with the 2009 question from Bizhan, answered by Doctor Minter:

Endpoints of Intervals Where a Function is Increasing or Decreasing Why dosome calculus books include the endswhen determining the intervals in which the graph of a function increases or decreases whileothers do not? I feel that the intervals should be open and the ends should not be included as they may be, for example, stationary points where a horizontal tangent can be drawn. I noticed that the AP Central always include the ends in the formal solutions of such problems (one can see many examples there), but an author like Howard Anton never does. Could you clarify this for me please? Can both versions be correct?

For example, for the function \(y = -x^3+12x\),

we see that it is increasing from -2 to +2. But should we describe that as the open interval \(\left(-2,2\right)\), or as the closed interval \(\left[-2,2\right] \)? Different textbooks give different answers.

Doctor Minter began:

You pose an excellent question, and I agree that the discrepancy among the various textbooks is quite misleading. I completely agree with your claim thatthe intervals should be open(that is, should not include the endpoints). Let me attempt to give comprehensive reasoning as to why this should be. We use derivatives to decide whether a function is increasing and/or decreasing on a given interval. Intervals where the derivative is positive suggest that the function is increasing on that interval, and intervals where the derivative is negative suggest that the function is decreasing on that interval.

He goes on to explain that in order for the derivative to be positive at a point, the derivative must *exist* there, so the function must be *defined*, and *increasing*, and *differentiable*, in some interval *around* that point. So he concludes,

In summary, for a function to be increasing (all of these concepts are similar for decreasing intervals as well), we have to be able to show that the function is greater for larger values of "x," and less for smaller values of "x" in a small neighborhood around each point in the interval. An endpoint cannot have both of these properties.

It is important to note that both the question and the answer come from the perspective of **calculus**, and depend on defining “increasing” in terms of the derivative.

But that is not the *only* way to define “increasing on an interval”!

Back in 1997, Doctor Jerry had answered a similar question, pertaining to the rules for AP calculus:

Brackets or Parentheses? We have been discussing a problem in my Advanced Placement Calculus class. It concerns increasing/decreasing functions as well as concave up/concave down. ... When expressing these answers as an interval, should I use a bracket, symbolizing thatthe endpoint is included, or a parenthesis, symbolizing thatthe endpoint is not included?

Doctor Jerry replied (in part):

Different books, teachers, and mathematicians use slightly different definitions of increasing functions, but this is not a matter of much consequence as long as one is consistent. Suppose your definition of an increasing function is:f is increasing on an interval I if for each pair of points p and q in I, if p < q, then f(p) < f(q). Note that I may be open (a,b), half-open [a,b) or (a,b], or closed [a,b]. Consider f(x) = x^2, defined on R. The usual tool for deciding if f is increasing on an interval I is to calculate f'(x) = 2x. We use the theorem:if f is differentiable on an open interval J and if f'(x) > 0 for all x in J, then f is increasing on J. Okay, let's apply this to f(x) = x^2. Certainly f is increasing on (0,oo) and decreasing on (-oo,0). What about [0,oo)?The theorem, as stated, is silent.However, one can go back to the definition of increasing. To show that f is increasing on I = [0,oo), let u and v be in I and u < v. If 0 < u, then the theorem applies. Otherwise, 0 = u < v and we see that f(u) = 0 < f(v) = v^2.Many instructors, books, and even AP exams often skip consideration of endpoints.If you want to be ultra-safe, then you can do the above kind of analysis. It just takes a few extra steps, usually easy, past the standard test.

Note that here, the question and answer are still in the context of calculus, but Doctor Jerry defines “increasing” *without* calculus, and then applies a *theorem* that relates this definition to the derivative: *If* the derivative is positive on an interval, *then* the function is increasing on that interval. This theorem doesn’t tell us anything about intervals in which the derivative is sometimes zero, so we have to fall back on the definition, not the theorem. And he concludes that his function is increasing on the half-closed interval \(\left[0,\infty\right)\).

There seems to be a difference of opinion here! We have to bring them together and compare them.

In 2014, Kevin read Doctor Minter’s answer, and questioned it:

Doctor Minter gave an argument for why endpoints should *not* be included when determining intervals where a function is increasing or decreasing. Implicit in your answer is that "increasing at a point" means "has a positive derivative in a neighborhood of that point." I wonder if it makes sense to define increasing at a point. I also wonder about another definition that I came up with (which, I acknowledge, doesn't work for a point). We could defineincreasing for an interval[a, b] as: whenever x and y are in [a, b] then f(x) < f(y). This makes no reference to derivatives, so you could still talk about a function being increasing even if it fails to be differentiable some places (e.g., we could say x^(1/3) is increasing everywhere). I'm also thinking about piecewise functions with jumps; again, it seems like we should be able to say they're increasing even if the derivative doesn't exist everywhere. With this second definition, it seems to me that if a function is increasing on (a, b) and continuous at a and b, then it would be guaranteed to be increasing on [a, b]. What do you think? Do we need to have a definition of "increasing at a point" for some reason? Is there any way to reconcile these two definitions? There doesn't seem to be consensus here. It's a basic calculus concept and there seems to be two (very convincing) ways of looking at it that are in conflict.

This was a very perceptive question. There are really two different concepts here, just as Kevin suggested. The concept of increasing *on an interval* does not require calculus, and applies to functions that are not differentiable. The concept of increasing *at a point* requires calculus, and is often what the authors of calculus books are really talking about; Doctor Minter took “increasing on an interval” to mean “increasing at every point in the interval” in this sense. [Doctor Fenton, in an unarchived 2007 answer, mentioned that “increasing at a point” can be defined instead as “f(a) is larger than any f(x) for x to the left of a, and f(a) is less than any

f(x) when x is larger than a”. This makes it still independent of the derivative.]

I started by discussing the definition Kevin gave, which was (when clarified) identical with Doctor Jerry’s:

This is the proper definition of increasing on an interval, which applies to any function, and is found, for example, here: http://mathworld.wolfram.com/IncreasingFunction.html A function f(x) increases on an interval I if f(b) ≥ f(a) for all b > a, where a, b ∈ I. If f(b) > f(a) for all b > a, the function is said to be strictly increasing. ... If the derivative f'(x) of a continuous function f(x) satisfies f'(x) > 0 on an open interval (a, b), then f(x) is increasing on (a, b). However, a function may increase on an interval without having a derivative defined at all points. For example, the function x^(1/3) is increasing everywhere, including the origin x = 0, despite the fact that the derivative is not defined at that point.

They state the theorem Doctor Jerry used, and give the same example Kevin gave, which shows that the issue applies not only to endpoints, but to any isolated point where the derivative is zero or undefined. For reference, here is the graph of \(y = x^{1/3}\), the cube root:

Note that although its tangent is vertical at the origin, so it is not differentiable there, it is clearly increasing everywhere. Similarly, the cube function, \(y = x^{3}\), is increasing everywhere, although at the origin it is (momentarily) horizontal:

I also commented on Kevin’s final paragraph about consensus:

There are actually two different concepts: aprecalculus concept, applicable to any function; and acalculus concept, applicable to differentiable functions. I would prefer not to confuse them. Much as we distinguish uniform vs. pointwise continuity, these notions of increasing could be better distinguished like this: * The function f isincreasing on the interval[0, 1], meaning that comparing any two points, the one on the right is higher. * The function f isincreasing at every point on the interval(0, 1), meaning that the derivative is positive everywhere.

Then I quoted from a 2013 conversation I’d had in which this topic had arisen, in which I referred to both Doctor Minter’s and Doctor Jerry’s answers. The question there (in the midst of a long discussion with Aakarsh) made the conflict even stronger:

I have y = f(x). On the x-axis there are points a, b, and c. When x = a, y = 0; when x = b, y = 4; when x = c, y = 1. I realise the functionincreases on [a, b]. I also realise the functiondecreases on [b, c]. But why is the b in brackets? I know that they indicate closed intervals; that's no problem. If the graph increases from point a to point b, that is [a, b], but then the graph *MUST* decrease on (b, c]. If it increases TO "b," it decreases FROM "b" ... EXCEPT "b"?

Whoa! Can the function be both increasing and decreasing at the same point? If not, how would you decide which? (Notice that Aakarsh was taught to use closed intervals, and accepted that.) I first clarified the question:

I'll suppose that you were given a graph like this:

I think what you are asking is why they include the endpoints in the intervals. That's strange, because we've had other questions about why the endpoints are NEVER included in the interval of increase or decrease! Different texts have different policies on this. How does your text DEFINE "increasing on an interval"? Can you show me the first example they give? See these pages, which emphasize this variability among texts: Brackets or Parentheses? http://mathforum.org/library/drmath/view/53566.html Endpoints of Intervals Where a Function is Increasing or Decreasing http://mathforum.org/library/drmath/view/73202.html I'm inclined to agree more with the first of these than the second; but I think in pre-calculus, it's a good idea to ignore this detail and either always use open intervals or always use closed intervals. It's not really an important issue; but your concern that a function can't be increasing AND decreasing at the same point would tilt me in the direction of using open intervals just to avoid confusing students like you! Really, however, you need to notice that your definition probably is only about increasing or decreasing IN AN INTERVAL, not AT A POINT. That is, they are not saying the function is increasing at b -- only that it is increasing in the interval [a, b]. Sono claim is being made that the function is both increasing and decreasing at b!Once I see your book's definition, I can be more clear on that.

As I continued writing to Kevin,

This student never replied with his text's definition, so I didn't get to explore the details with him. One such detail would have been the distinction between saying that a function isincreasing on an intervaland saying that that is amaximal interval, in the sense that there is no containing interval (open or closed) on which it is increasing. In my experience, texts leave a lot unstated. While these omitted details might keep things simple for the less mature student, they would be worth exploring with a curious, capable one like you!

As I see it, when a textbook (particularly at the precalculus level) asks for “*the* interval on which f is increasing”, it is not asking for *any* such interval, but for the *largest*. Ideally it would say that; but often they will rely on our instinct to give the “best” answer possible – just as when we are asked what shape a square is, we don’t say “a rectangle”, but give the most precise answer that fits. If they have defined “increasing” as I have, then the closed interval is the correct answer. But there is room for disagreement, especially among students who are thinking more informally.

Some texts, to avoid confusion about endpoints, will specify that they are asking for the largest *open* interval on which the function is increasing. I think this is the best solution at that level. At a higher level, where precision of definitions is important, just state your definition and act on it.

It happens that, two months before Kevin wrote, someone else had written about the same issue, suggesting that we should add to Doctor Minter’s answer some information about the reasons for different answers. Kevin’s question gave the occasion to do just that. In corresponding with Ken about his suggestion, I said the following:

I agree with you on your point about different definitions. That's something I often emphasize in my answers; and it's also an explanation for our having answers on our site that don't agree. I like to refer "patients" to past answers, in part to show them that the same problem can be looked at from different perspectives. I also recognize that each answer has its own context, answering a particular student's question either in the light of that student's level or, perhaps, that Math Doctor's personal context. Our goal is not, generally, to give a comprehensive survey of a topic covering all possible variations, but to show a variety of individual interactions. No one answer will cover everything I wish it did (even if it's one I wrote myself a couple years ago). So I just write another and link to the old one(s).

On Doctor Minter’s answer in particular, I said this:

I think Dr. Minter was probably assuming the discussion is about maximal intervals; he also seems to be assuming the function is differentiable, which is not necessary to be an increasing function. Actually, that's my own main objection to his answer: he hasn't actually defined what he means by increasing on an interval, or what context he is assuming. The answer to many questions depends heavily on the context, and we seldom get enough context to be able to give a perfectly appropriate answer. I personally try to state my assumptions up front (if I don't just ask for the definitions and context), so it can be clear what we are talking about -- especially if I am thinking of having my answer archived.]]>

Here is a 2005 question:

Why Is Slope Rise Over Run and Not Run Over Rise? Someone asked our math teacher yesterday,why is it rise over run and not run over rise... I was wondering since our teacher didn't know the answer, if you might. If you could answer this that would be so great!!! I think that maybe it could be both ways because like for instance: (I'm going to show my work here) (4,2)(3,-7) 2-(-7) / 4-3 = 9/1 y=mx+b 1/9 + 2 y=9/1x + 2 and when you graph it you put the dot at the 2 place and go up 9 lines and go out one line and put a dot (then connect those dots) BUT you can also put a dot at the 2 go out one, and up 9 and put a dot there (then connect) and it's the same thing. Do you think rise over run can also be run over rise???

Kristin’s thoughts focus on how you draw a line with a given slope: you can either go up 9 and then right 1, or right 1 and then up 9. But these are both a slope of 9/1 = 9; they are just two ways to get to the same point. Kristin didn’t relate these to the equation, to see whether both equations give the same point.

But I saw two separate points to discuss:

There are at least two questions here:why do we define the word "slope" to mean the ratio of "rise" to "run", andwhy is that the right number to use in the slope-intercept form of a line? The answer to the first question is that any number we assign to a "slope" ought to be bigger when the line is sloped more steeply. We want the "slope" to tell us how much the line is sloped. A steeper line goes up more in the same distance:

If we used the "run" over the "rise", then the less steep line would have a greater slope, which wouldn't make sense:

So the ratio as we define it fits the word “slope”, by being greater when the line slopes more. Now the second issue:

How about the question of what goes in the "m" spot in the equation? Let's look at the intercept and one other point on the line:

What is the slope of that line? y-b y-b m = --- = --- x-0 x If we multiply both sides of the equation by x, we get mx = y - b and adding b to both sides gives mx + b = y So if we define slope as rise/run, then this is the equation that any point on the line has to fit.If you defined slope differently, you would get a different equation.So here's how it works: we define slope in a way that makes sense based on what the word "slope" means; then we find that we can use that slope value in an equation that describes any point on the line. One could use a different formula for something like "slope", and get a different equation; butthis one gives us a reasonable definition AND a nice little equation to use it in.

I didn’t follow up on the idea of getting a different equation using the upside-down definition. Let’s do that now. Suppose we define “new slope” as \(\displaystyle n = \frac{x_2-x_1}{y_2-y_1}\). Then, in the situation above, we have \(\displaystyle n = \frac{x}{y-b}\), leading to the equation \(\displaystyle y = \frac{x}{n} + b\). Nothing really wrong with that as an equation … but then we would have students asking why we have to divide by n rather than multiplying by n. And that would motivate flipping the definition of n.

Doctor Ian gave similar reasoning here, in 2008:

Why Does the Slope Formula Work? To find the slope of a line, you do the formula y2-y1/x2-x1, when x1 is not equal to x2.But why does this formula work?I don't understand the relationship of the subtraction and division with slope. Is it simply an equation that creates different numbers which signify different angles?

His focus was on gaining an understanding by going through the process of inventing the concept oneself:

In many cases, the best way to understand something is to re-invent it for yourself. Suppose I have two lines, like B C . . . . . . . . A We would say that AB is "steeper" than AC, right? But how could you QUANTIFY that?What kind of calculation could you do to attach a number to the idea of "steepness"?That's what we're doing with slope. We're trying to capture that concept in a calculation. Now, in the case of AB and AC, they both have the same amount of vertical travel ("rise"). But AB has less horizontal travel ("run").If we divide rise by run, we get a larger number for AB, which is nice, because it makes sense that a larger value should correspond to a steeper line. Now, where does subtraction come in? Well, suppose we think of each line segment as the hypotenuse of a right triangle: B C . | . | . | . | . | . | . . | | A______________|_______________________| If we know the coordinates of A, B, and C, can you see how we can use subtraction to find the rise and run for each segment?

Another point might be worth making here: Wouldn’t we get an equally good measure of steepness (maybe even *more* natural) by making it rise/drive — that is the rise divided by the distance we actually travel along the line? On the surface, that makes sense; the only problem is that the equations you’d get would be horrendous. The slope formula would become \(\displaystyle s = \frac{y_2 – y_1}{\sqrt{(x_2 – x_1)^2 + (y_2 – y_1)^2}}\), and the equation of the line would be \(\displaystyle y = \frac{sx}{\sqrt{1 – s^2}} + b\), which amounts to converting our new slope s to the familiar m before using it.

My conclusion: It is really the convenience of the equation, more than anything, that drives the choice of definition.

By 2012 I had changed my own perspective to what I just stated, so I was glad of an opportunity to answer the question again.

Here is the question that came in then, from teacher Sherrie, who gave just the right setup for what I wanted to say:

Why Y Rises and X Runs I have researched online for a reason why slope is y over x, and have not found any answers. The formula can be derived by the point-slope form of a line, butwhich came first -- the chicken or the egg?I have usually told my class that some guy (because apparently it was the guys who used to do all the math)arbitrarily decidedthat we were going to look at the change of a line as rise over run rather than the other way around.

I started with the usual reasoning based on the meaning of the word “slope”:

We'd like "slope" to be a number that is greater when a line is more sloped (steeper), and smaller (by which I mean, closer to zero) when the line is less sloped. Sothe slope should say how fast the line RISES as you move forward. That makes it the ratio of the distance risen to the distance moved forward, just as speed (how fast you move forward as time passes) is measured as the ratio of distance moved to time elapsed. One way to answer this kind of question (even if you don't know the answer ahead of time) is to experiment with alternatives. Draw a few lines (say, with slopes 1/2, 1, and 2) and note what they look like; then calculate what their "slopes" would be if you defined them as change in x over change in y. Which numbers make more sense as a measure of slope?

Then I dealt with the equation in a new way, starting with the equation:

You're also right that the concept of slope can be derived from the equation of a line.Starting with the simplest form, y = ax + b, you can ask what "a" and "b" MEAN. You find that "b" is the point on the y-axis where the line crosses (soyou invent the term "y-intercept" and decide that it is useful). You also find that "a" represents the rate at which the line rises. (By the way, in America we traditionally call that "m" for no good reason; "a" just tends to get used in other ways). This leads you todecide that this ratio is also useful, and the name "slope" makes good sense for it. So, the name "slope" fits the definition "change in y over change in x," and that concept is useful because of slope-intercept form. Maybe the egg came first, but the chicken made it worth incubating!

From this perspective, it is quite possible that the simple form of the equation came first, with no particular names attached to the coefficients; and the names were given based on their meaning. So maybe someone didn’t come along and ask, “How shall we define something called ‘slope’?”, but rather, they recognized that this concept fit well with a basic linear equation, so it deserved a name – and “slope” fit perfectly!

Incidentally, this can also explain why the names we give the parameters (m and b in the U.S., various other combinations elsewhere) don’t stand for anything (like “s” for “slope”, as many students ask us). They are just parameters in the equation, which happen to have meanings, rather than starting out as something with a meaning (like “r” for radius in the equation of a circle). For more on this, see

Why b for Intercept?

You may also find this relevant:

Order in Linear Expressions

I had a little more to add:

One more thought: As students move on, they will find slope showing up in many other places, where the same definition is still useful. It represents therate of changeof y with respect to x, and therefore the rate at which any function changes. In calculus, this becomes thederivative. In physics, it is aspeed. In business, it may show up as aprice per unit, or as acost per hour. A slope is a rate, and it is always the ratio of the change in the dependent variable to the change in the independent variable. So a third reason we define slope as we do is thatit is a very fertile concept that becomes more and more important as you learn more.

And that is often the ultimate reason for a definition: it becomes popular because it makes subsequent math work well. All the ideas I listed are just different names for the same concept, and all must be defined as “output change over input change” in order both to mean what we want, and to behave conveniently (for example, you can add speeds together).

This reminds me of a case where both orders are used. In the U.S., we measure the efficiency of a car in **miles per gallon (mpg)**: how many miles you get per gallon used. In some metric countries, **kilometers per liter** **(km/L)** is used similarly. Here a larger number is better; this ratio represents **efficiency**, and we are thinking of fuel as input and miles driven as output. (That makes sense: It’s what a car does!)

But in other places, the usual measurement is **liters per 100 kilometers (L/100 km)**. This number represents the **rate of usage**: how much fuel you use to go a certain distance. Here a smaller number is better. They are thinking of distance driven as input and fuel usage as the result. (That makes sense, too: It’s what you do when you drive a car.)

Both approaches, though opposite, make sense; the first is most natural as part of figuring out how far you can go with the fuel you have, while the second is most natural for deciding how much fuel it will take to go a certain distance.

But always, a rate is the same as a slope: the rate of change of an output per unit change of input. What varies is what we are thinking of as the output.

]]>Sometimes a problem leads to a very interesting discussion that brings out many good ideas – but then turns out to be something entirely different, which brings out even more (and simpler) ideas. This polynomial equation problem we helped with last week was like that. I will not be quoting the whole discussion as it took place, to avoid confusion, but just exploring some of the places we went.

The initial question was simply stated, but clearly difficult:

(x+1)(x+2)(x+3)(x+4)=99

Find the real roots of the equation above.

The direct method of solution is to expand the left side and write it as a typical polynomial: \(x^4 + 10 x^3 + 35 x^2 + 50 x – 75 = 0\). But this is a fourth-degree polynomial equation, which is not generally solvable without great effort. I quickly checked if it had a rational root, which would have to be a factor of 75; that didn’t look promising, and in fact I could see that the answer was no, without checking all the possibilities, by imagining what the graph of the left-hand side of the original equation would look like. The equation \(f(x) = (x+1)(x+2)(x+3)(x+4)\) would have x-intercepts at x = -1, -2, -3, and -4, so from my experience with polynomials I knew the graph would be something like this:

The real roots of the equation will be the points where this graph crosses the horizontal line \(y = 99\). Checking a couple values, \(f(0) = 24\), and \(f(1) = 120\), so it must cross between x=0 and x=1. Since we know any rational roots have to be integers, this means there aren’t any.

But from graphing, I knew that the graph is symmetrical. I don’t know a specific theorem that proves this must be true if the zeros are symmetrical as these are, but I’m sure it can be proved. (This is why I only bothered to check for a root on the right; the one on the left will be symmetrical with it.) I also had a pretty good guess that the “wiggles” in the graph would not be big enough to reach 99, so our equation would have only two real roots. (I could check that by evaluating the peak’s height, \(f(-2.5) = 0.5625\)).

At this point I was out of standard ideas; I am by no means an expert on the theory of equations, as some other Math Doctors are! But I got an idea: If we shift the graph so that its line of symmetry is the y-axis, that might simplify things. It did. So I suggested trying this:

After playing with it for a while, I discovered an idea that worked. I’m sure there are others.

I observed that the function y = (x+1)(x+2)(x+3)(x+4) is a polynomial with zeros at -1, -2, -3, and -4, and experience told me that its graph will be symmetrical about the mean of those zeros, -2.5. (Just by imagining the graph and checking some numbers, I can also see that it will equal 99 between 0 and 1, and again between -5 and -6, giving two real solutions to the equation. But the rational root theorem shows that these can’t be rational numbers.)

So I tried making the symmetry visible, by translating the graph right by 2.5, by replacing x with u-2.5. Try doing that, and see what happens.

Here is the rest of my work following this idea, which I actually wrote up later, after the solution had been found by other means:

After substitution, we have (u – 3/2)(u – 1/2)(u + 1/2)(u + 3/2) = [(u – 1/2)(u + 1/2)][(u – 3/2)(u + 3/2)] = (u

^{2}– 1/4)(u^{2}– 9/4). This has to equal 99:(u

^{2}– 1/4)(u^{2}– 9/4) = 99Letting v = u

^{2}and expanding, we get (v – 1/4)(v – 9/4) = 99, which expands to v^{2}– 5/2 v + 9/16 = 99; multiplying by 16 to eliminate fractions, 16v^{2}– 40v – 1575 = 0.By the quadratic formula, I get v = (40 ± 320)/32 = 45/4 or -35/2. Since this is u

^{2}and we only want real roots, u = ±sqrt(45/4) = ±3 sqrt(5)/2.Then, in turn, x = u – 5/2 = (-5 ± 3 sqrt(5))/2.

Incidentally, the problem was supposed to be solved without a calculator; I didn’t *need* one, but the work would have been a little cumbersome without it.

The student, in the meantime, found a different way to solve it, which is perhaps a little less intuitive and must have been accidentally discovered in an impressive bit of insight. Here is my version of what he did:

We can expand the left side partially, just multiplying (x+1)(x+4) and then (x+2)(x+3):

(x^2+5x+4)(x^2+5x+6) = 99

The interesting thing is that the factors both contain x^2 + 5x, which we can replace with t, giving

(t + 4)(t + 6) = 99

t^2 + 10t – 75 = 0

(t + 15)(t – 5) = 0

t = -15 or 5.

This implies that either

x^2 + 5x = -15 or x^2 + 5x = 5;

these yield

x = (-5 ± sqrt(-35))/2 and

x = (-5 ± sqrt(45))/2;

only the latter solutions are real, and they are the same as mine.

Like my solution, but more quickly, this reduced the quartic equation to a quadratic, which could be solved.

But in the process of discussing all this, the student revealed that he had accidentally omitted part of the problem, which should have been this:

(x+1)(x+2)(x+3)(x+4)=99

Find

the sum ofthe real roots of the equation above.

Since the real roots were \(\frac{-5 \pm \sqrt{45}}{2}\), their sum is -5, so that is the answer.

But that problem could have been solved instantly by my original observation that the graph was symmetrical about x = -2.5, and that there are only two of them:

There could conceivably be

fourreal roots of the equation (that is, places where the graph crosses y=99), if the peak in the middle of the graph were high enough. That would happen, in fact, if we replaced 99 with 0.5. They probably chose 99 to be so high that there would be no question.Given that fact, we don’t need to know the two real roots themselves. Because of symmetry, they will be -2.5 plus something, and -2.5 minus the same thing; their sum is -5: (-2.5 + p) + (-2.5 – p) = -5.

So most of the conversation was “wasted”, if you consider only what was actually needed in order to solve the problem; yet it took us into some very interesting areas on our long detour!

Incidentally, there is a lot known about quartic equations that is more advanced (and which I wouldn’t have used here even if I knew them well, because this problem clearly was meant to be solved simply). Here are some discussions we have had on such methods:

Ferrari's Method for Quartics Factoring Quartics Is There a 'Discriminant' for a Quartic Equation? Is There a "Discriminant" for a Quartic Equation ... in Closed Form? Solving the Quartic Solving a Quartic Equation with Substitutions

This last one shows my method, exactly – in fact, I could have copied my work and explanation from here, if I had known about it! So it wasn’t a great new discovery; but since it was new to me, it was fun.

Here is one final thought: Since both my method and the student’s depend on the symmetry of the zeros, his method should work for the problem on that last page. In fact, it does; we get the same real roots that Doctor Douglas got, and a pair of ugly complex roots as well, with considerably less work. I am curious to know whether that, too, is a well-known trick.

]]>We have had a number of questions over the years about inverse trig functions and their ranges. For today’s question, I have chosen one from 2011, which will link to a number of others that I will not quote in detail.

Here is the question:

Arcsin, Arccos, Arcsec Are Confusing in their Ranges Is the domain for arcsin 0 to pi? For arccos, is it -pi/2 to pi/2? I'm confused about how these are determined. I think the domain of arcsin is 0 to pi because y is positive at these values. Is that the right reason? Also, what is the domain of arcsec?

Jayson basically got everything wrong (from the word “domain” to the intervals he chose and the reason he gave), so I had to start from scratch; but I did so in part by referring to past answers that handled it well. As it turns out, I spent most of my time on the last line, and going beyond that. But here’s the start:

When you say "domain," you really mean "range," right? Therestricted domainused for the secant, etc., before taking the inverse, becomes therange of the inverse function. This range is not really "determined" as if we just have to study the function to find what its range MUST be. Rather, we make apartially arbitrarychoiceof a domain to which we can restrict the trig function -- a choice that will yield all possible values of the function exactly once, and be as well-behaved as possible. This is described here: Inverses of Trigonometric Functions http://mathforum.org/library/drmath/view/61051.html The general goal is to pick a range that iscomplete,contiguous(or as nearly so as possible),close to zero, andpreferably positiverather than negative. For the sine, we can accomplish all but the last by using [-pi/2, pi/2], where sin(x) goes from -1 to 1. For the cosine, that wouldn't be one-to-one or give all possible values. But the next best choice, [0, pi], does meet all the goals. The tangent works the same as the sine, except that you can't include -pi/2 or pi/2.

In the reference, Doctor Rick explained why we need to restrict the domain of a trig function before making an inverse, and listed the usual choices. Since I can more easily include pictures than we could back then, let me add here a visual view.

Consider the sine and cosine first. In red below we see their graphs; neither is one-to-one until we choose some part of the graph, in blue. Each blue part has been chosen so that every possible value of y is present (what I called “complete” above) without repeating any, and the values of x are close to zero. Other choices are possible, but these are standard.

The inverse is found by interchanging the roles of x and y; the red parts would keep these from being functions, so we have chosen a range that makes it work:

The tangent is much the same as the sine:

Let’s continue with my answer:

When you get to the cotangent, secant, and cosecant, the right choice is not quite as obvious. That's especially true of the cotangent. At first you'd think the cotangent should obviously have the same domain as the cosine (without the end points), much as we do for the tangent with regard to the sine; that makes its graph continuous, and that is the usual choice. But, as I said, nothing forces us to make that choice, and there are some reasons in favor of instead choosing [-pi/2,0) U (0,pi/2], even though that is not a contiguous interval. Here's a nice explanation I found (which happens to quote me!): http://www.squarecirclez.com/blog/which-is-the-correct-graph-of-arccot-x/6009 Now, in my quoted answer, I was not so much stating a strong opinion on this choice as I was giving a reason for the choice I was asked about, which happens to agree with our FAQ. My explanation for the contiguous range seems reasonable; why in the world would anyone choose the other?

The answer this site quotes is

Domain of Arccot

where, as I said, I was focused on justifying the domain (for the cotangent – here again, it’s really the *range* of arccot!) that a student had been taught, not on arguing in defense of it. Here are the two alternative versions:

The first choice has the advantage of being continuous (making one contiguous interval), and seems to be the usual. (I made these graphs on Desmos.com, and the first is their graph of arccot(x); I had to construct the second graph piecewise.)

So, as I asked, why would anyone prefer that second one, as ugly as it is? The site that quoted me does a pretty good job of explaining it, but I might as well give my own perspective. Since one source that prefers it is Wolfram, I quoted their explanation, and analyzed that:

The following site states the convention chosen by Mathematica software: http://mathworld.wolfram.com/InverseCotangent.html There are at least two possible conventions for defining the inverse cotangent. This work follows the convention of Abramowitz and Stegun (1972, p. 79) and Mathematica, taking cot^(-1)x to have range (-pi/2,pi/2], a discontinuity at x = 0, ... This definition is also consistent, as it must be, with Mathematica's definition of ArcTan, so ArcCot[z] is equal to ArcTan[1/z]. A different but common convention (e.g., Zwillinger 1995, p. 466; Bronshtein and Semendyayev, 1997, p. 70; Jeffrey 2000, p. 125) defines the range of cot^(-1)x as (0,pi), thus giving a function that is continuous on the real line R. Extreme care should be taken where examining identities involving inverse trigonometric functions, since their range of applicability or precise form may differ depending on the convention being used. So the reason for their choice is that they want it to be always true that arccot(x) = arctan(1/x) This makes sense, since cot(x) = 1/tan(x) This implies that arccot and arctan have to have the same range!

I went on to look at what the corresponding identities have to look like with the usual choice of range, and it isn’t pretty. So this choice starts to make sense.

I went on to look at the inverses of secant and cosecant, whose graphs on Desmos look like this:

This choice of range makes good sense: the range of arcsec agrees with arccos, and the range of arccsc agrees with arcsin, and is as close to contiguous as we can get. But there is an interesting issue with arcsec:

I don't find as much disagreement about the range of arcsec and arccsc. Here, the usual restricted domain is [0,pi/2) U (pi/2,pi] for secant, and [-pi/2,0) U (0,pi/2] for cosecant. These follow my guidelines above, the former matching cosine and the latter matching sine. This makes their reciprocal identity work nicely, too. The following page mentions the reason foran alternative choice, based on calculus: Differing Definitions of arcsec(x) Lead to Confusion over Signs http://mathforum.org/library/drmath/view/69193.html

On that page, I discuss the *derivative* of the arcsec function, which (using the usual definition) requires an absolute value: \(\frac{d}{dx}arcsec(x) = \frac{1}{|x| \sqrt{x^2-1}}\). The student had misunderstood why he wasn’t getting that result, which was a matter of keeping track of the signs of the various trig functions; no calculus is involved in that discussion. But in my final response I mentioned an article that claims that the arcsec is problematic in calculus courses, because some sources take the range of arcsec to be \(\left[0,\frac{\pi}{2}\right) \cup \left(\pi,\frac{3\pi}{2}\right)\), so that the derivative is \(\frac{d}{dx}arcsec(x) = \frac{1}{x \sqrt{x^2-1}}\). The graph is then like this:

This looks *very* strange, with a big gap between intervals – not only non-contiguous, but non-adjacent! But it makes the derivative nicer, essentially because the tangent is always positive in this range. On the other hand, this alternative definition changes the identity: we no longer have \(arcsec(x) = arccos\left(\frac{1}{x}\right)\), but would need something like \(arcsec(x) = 2\pi – arccos\left(\frac{1}{x}\right)\), for x < 0.

Wolfram’s MathWorld doesn’t clearly talk about the range of arcsec (focusing instead on complex variable issues), but what they say agrees with the usual version, and the related WolframAlpha graphs the function as expected. But they present the derivative in a tricky form that avoids the absolute value: \(\frac{d}{dx}arcsec(x) = \frac{1}{x^2 \sqrt{1-\frac{1}{x^2}}} = \frac{1}{x \sqrt{x^2-1}}\) for *x* > 0.

I have only rarely found a graph that uses this range for arcsec, so I can’t confirm the claim that it is common.

Ultimately, the answer to your question is (a) it's a choice, so to find what the range of an inverse trig function is, you just have tolook in your text to see what they're using; and (b) the choice is made based on what will make the kind of math you're doing (trig identities, calculus, etc.) work best. That's true of a lot of definitions in math:convenience rules!

By the way, if you are wondering about the notation I have used, like arcsin, rather than the alternative terms like \(\sin^{-1}\), see

Trigonometry Terminology]]>

First, how can something empty be called a set in the first place? Isn’t a “set” a collection of things? That means at least one, doesn’t it?

How Can a Set Be Empty? Why is the empty or null set called a set when it has no elements? Is there a mathematical proof that it's a set?

Since this is a matter of definition, it can’t be *proved*; but it can be *justified*. In this case, I focused on what is called the closure property: We want the operations we do on sets to be “closed”, meaning the result will always still be a set.

We try to make our definitions so that they are as useful as possible. In this case, we would like all the operations we can do between sets to yield sets, just as we want addition and multiplication of two numbers to produce a number. Now, what happens when you take the intersection of a pair of disjoint sets (sets with no elements in common)? The result is an empty set, right? If we didn't call that a set, then in this (rather common) case, the result of the intersection operation would not be a set. This is typical of the way math is done. We make some natural definition (for example, thinking of a set as any collection of objects), and then work with it; eventually we find that we have to refine our definitions, or clarify the extreme cases, in order to make our new branch of mathematics work neatly. We can't "prove" that the empty set is a set, since we are defining it as such; but we do have todemonstrate that it is a useful and consistent definitionthat produces interesting mathematics. It does!

Here is another version of the same question, from a teacher:

Definition of Set and How the Empty Set Fits within It The definition of a set is "A collection of well-defined objects".Whenever I teach this topic, my students become confused about the idea of the null set, because they think every collection must have some elements. I say to them that a collection can be empty, but still they are not satisfied. How can I get them to understand that the definition allows for an empty collection?

Doctor Tom, in a long answer that is worth reading, first points out that mathematicians don’t *formally* define a set this way, and introduces the idea of Axiomatic Set Theory, which gets around this. Then he talks about how students can be introduced to the ideas of sets *informally*:

I like to tell my students that a set is like a box that may or may not contain objects. So the set: {1, 2, 3} is a box containing those particular three numbers. The empty set is simply an empty box.

This is just a way of thinking about sets that makes the idea of an empty set feel more natural.

So let’s accept that an empty set makes sense. But what about this idea that *the* empty set (there’s only one) is a subset of *every* set?

Is an Empty Set a Subset? The empty set is a subset of all sets, right? What is the proof of the example: For any event W in the sample space S, what is the proof that the empty set is a subset of W?

Here Anabelle is asking in the context of probability (where an “event” is a *set* of outcomes), but her question applies to sets in general. I gave not one, but three ways to think about it:

A subset of a given set is simply any set, all of whose elements are contained in the other. Since the empty set has no elements,all of its elements are in any other set!It sounds weird, but that's the way logic works. To put it another way, a set A is NOT a subset of B if there is some element x of A that is not in B. Since the empty set has no elements that are not in your given set,we can't say it is NOT a subset. That means that it is. To select a subset, we must look at each member of the set and decide whether to keep it. If we say "yes" to every member, we have the set itself; if we say "no" to all of them, we have the empty set. We could choose to exclude these from the definition of subset, butit makes a lot of things easier if we include them. That way there are no special cases to deal with when we state theorems.

The first answer depends on how mathematicians think of the word “all” when “all” is nothing; we’ll dig into this idea (called vacuous truth) below, so hold on if it makes no sense to you!

The second answer is a justification of the first. If we turn our perspective around and think about what it takes to recognize something that is *not* a subset, it makes a little more sense. This ties in to the idea I have discussed previously of “innocent until proven guilty”: if there is no proof that it is not a subset, then it is.

The third answer is like my answer to the first question above: This way of defining subsets makes other things work better than if we didn’t take it this way. Specifically, we have chosen to define “subset” so that if you select elements from a set by making a checklist of the elements of the full set and checking off those you want in the subset, *any* choice – including the choice *not* to include *any* – results in a subset. That makes a lot of theorems easier to state, because it is consistent.

Now, some people, trying to have fun with surprising mathematical ideas, take this a little too far. They point out that there is only **one empty set**, which is a subset of every other set; so they might say that the set of all elephants in Antarctica and the set of all living Tyrannosaurs are the *same set*. An adult wrote to us in 2015 (not archived) asking about this, pointing out that in such a case, one is really considering **two different universal sets**. Whenever you deal with sets, you must always be working within some specific universe, or fallacies can result. So, as I told Amit, “they are probably overlooking the fact that their descriptions seem to imply different universes, so that they are really violating proper rules for sets, for the sake of humor or vividness. The point that there is only one empty set (within a universe) is true, but such an illustration probably goes too far.”

Now let’s look at that idea of vacuous truth. The following question is about more advanced math, involving functions from one set to another; you don’t have to know anything about that to follow the parts of the answer I will quote. Here is the question:

Vacuous Cases, Empty Sets, and Empty Functions I am having difficulty understanding 'vacuous' situations as in, if A is an empty set and B is a non-empty set then (i) there is one function f: A \to B namely the empty function but (ii) there is no function f: B \to A. An empty set is a set with no element but what is an empty function? There is a function from an empty set to a non-empty set (how) but not vice-versa. I am used to the case A and B are non-empty so A x B does not go against (my) intuition. In the case A is empty and B is non-empty, A x B is non-empty but B x A is empty?

We won’t be looking here at the answer to the specific question about functions; but Doctor Jacques breaks his answer into two parts, and the first part is about “logical statements about the empty set”, which is our topic:

Let us first consider a statement about the elements of a set A. Assume S(x) is a statement about the object x (a logical proposition): depending on the particular object x, S(x) is either true or false. We can make a statement S(A) about the set A, by asserting that S(x) is true for every element of A : S(A) ::= "For all x in A, S(x) is true". For example, assume that x represents a ball, and S(x) is the statement "the ball x is red". Now, if A is a bag of balls, S(A) would mean: "For all balls x in the bag A, the ball x is red" or, more simply said: "All the balls in A are red" The question is now, what does this mean if A is empty? S(A) can only be false if you can find in A a ball that is not red. If A is empty, this is impossible, so S(A) cannot be false, and we conclude that S(A) is true--if the bag is empty, all the balls in it are red (although there are no balls at all). Note that it is also true that all the balls in the bag are black--there is no contradiction in this if the bag is empty. In a more abstract way, if S(A) is a statement of the form: For all x in A, S(x) is true then, whenever A is empty, S(A) is true--this does not depend on the particular form of the statement S(x). We can also see it in another way--S(A) means that A is a subset of the set of objects such that S(x) is true. Now, the empty set is a subset of any set, so, if A is empty, A is indeed a subset of the set of objects that verify S(x), and S(A) is true.

Here we have several ideas connected: “**for all** x in A” (the universal quantifier) is equivalent to “**if** x is in A” (a conditional statement); and also to “A is a **subset** of …”. As I have previously discussed why a conditional statement is considered true when its condition is false, the same reasoning applies here to the case where there are no x in A.

Not long before that answer, Doctor Jacques had answered a question about proving properties of relations, which you can find here:

Properties of Relation

Even the question depends on knowledge I don’t want to get into, but he starts out by preparing the student for some special situations, which involve vacuous truth:

I think we should first clarify a few issues about mathematical logic, and, in particular, the meaning of statements related to the empty set. When we say "If P then Q", or "P -> Q" this simply means that P is false or Q is true (or both). This has some consequences that may appear surprising (until you get used to them). For example: "If 6 is prime, then 11 is negative" is a true statement, because the "If" part is false. A statement like P -> Q does not mean that there is any "logical relationship" between P and Q. Consider now what happens when we make statements about elements of a set. Let us say we have a statement: "For all x in S, P(x)" where P(x) is some statement about x. This is equivalent to saying: x is in S -> P(x) What does this mean if S happens to be the empty set? In that case, the left part ("x is in S") is false, and therefore the complete statement is true, (whatever P(x) may mean). We can say that any property is true when applied to the elements of the empty set. For example, in a universe without birds, both statements: "All birds are green" "All birds are red" are true, and this does not create a contradiction. Another way to see this is the following example. Assume that you have to inspect bags that may contain red and blue balls, and you want a procedure to decide whether or not all the balls in a given bag are red. This means that, given a bag B, you want to decide whether or not it is true that: "For all x in B, x is red" The procedure would be executed as follows: (1) if there are no more balls in the bag, exit and return TRUE (i.e. declare that the statement is true) (2) pick a ball from the bag (3) if the ball is not red, exit and return FALSE (4) otherwise, go back to step (1) Note that you must execute step (1) at the beginning, because step (2) is illegal if there are no more balls. Now, you can see that, if a particular bag is empty, the procedure will immediately terminate at step (1), and declare the statement true. This shows that this interpretation is consistent. On the other hand (we will not use that here), a statement like: "There exists x in S such that P(x)" means "(There exists x in S) AND (P(x))" and, if S happens to be the empty set, this statement is false (whatever P(x) may mean), because the first part of the AND statement is false. To sumarize: * "If P then Q", or "P -> Q" means "P is false or Q is true" (and nothing else). * The statement "For all x in S, P(x)" is always true when S is the empty set. * The statement "There exists x in S such that P(x)" is always false if S is the empty set. The latter two statements do not depend on the definition of P(x).

This ties together much of what we have been saying.

]]>Last month we had a question from a Czech student asking about a geometry problem. The discussion illustrates language issues that can arise, and how we try to guide a student to solve a problem himself. I will fill in some gaps as we examine how to approach an interesting problem.

Here is Adam’s question:

Hello,

I need help solving a math problem, a good friend asked me for help, but I wasn’t able to solve it and now I am asking you for help.

The problem:

You have a Square K,L,M,N and a point A, that lies inside of the square.

Find all lines, named x, y, for which A is the center and the circumference of the square is their edge.I hope you understand me, my English is really not that good, here is a drawing I made for it to be a bit clearer. The black is given but can be chosen freely (except the centre, which would give an infinite amount of results and on the parallels, those are obvious), the red would be the result, but it has to be proven

Several details were unclear, so Doctor Rick asked for clarification, restating the problem as he understood it, and asking what kind of answer is needed (which is something people often omit, because it is obvious to them, knowing the context of the problem):

Hi, Adam.

I am not sure I understand every detail of what you’ve said. I understand it to be this:

Given a square ABCD and an arbitrary point A inside the square, identify all pairs of points X, Y on the perimeter of the square such that |AX| = |AY|.What isn’t clear is how you want the points X and Y to be specified.We could, for instance, use coordinate geometry. Let the vertices of the square be K (0, 0), L (s, 0), M (s, s), and N (0, s) where s is the side of the square; and let A be (a, b). Then we can give the coordinates of X and Y in terms of s, a, and b. Is that the sort of thing you have in mind? If not, can you say what you would consider an appropriate way to give a solution to the problem?

I understand that if A is at the center, then for any point X on the perimeter of the square, choosing Y at the opposite point on the square will fulfill the conditions. I’m not sure what you mean by “on the parallels”. If you mean on either of the lines through the center and parallel to a side of the square, that description fits the point A in your figure; I consider this to be “obvious”, but it might not be what you’re thinking about. Maybe you mean the diagonals KM and LN; in that case there is one solution and it’s reasonably obvious.

Let me know what you have in mind, and we can work together. I also want to know something about why your friend wants to solve this, and how you and he/she have tried to solve it.

Adam clarified, showing that the context is pure geometry, not analytic geometry:

Thank you for your answer, the problem has to be solved geometrically, by drawing. Yes, the ones that you said are obvious, what is problematic is finding the solution when A is placed in a random place.

My friend needs this problem solved for homework, she asked her classmates and me, because she wasn’t able to find out how to solve it, but no one she asked yet was. I attempted many things, even brute forcing it through trial and error, but I am unable to find a logical solution to it.

Now we could move forward; but in order to give help rather than an answer, it is necessary to have some idea of what knowledge the student has that could be used to discover a solution:

OK, I understand now that you’re looking for a geometric

of points X and Y. That helps a lot!constructionWhat you haven’t told me yet is what your friend has learned in class that might be useful. I see a very easy construction — in fact, I’d say

extremelyeasy (one step). It might be seen in several ways. For instance, you might use some knowledge of circles, or of right triangles, or maybe just congruent triangles.

Your figure may be misleading you, because it applies to one of the special cases I mentioned. If I were you, I’d start with a figure in which A is closer to one vertex of the square (say, M) than to the others.Do you realize how I know that A in your figure is on the line through the center and parallel to KL? In case you don’t, it’s because you put X and Y on opposite sides of the square. So in your new figure, don’t do that!

There is a very important point here, related to what I have said previously about the role of diagrams in geometric proofs. When the diagram one is looking at represents a special case, one sees things that are not relevant to the general case one wants to solve; so we want to **avoid special cases**. This is why, for example, I draw a scalene triangle when experimenting with a problem about triangles, to avoid seeing things that only apply to isosceles triangles.

The figure Adam supplied shows X and Y on *opposite sides* of the square, which forces the midpoint of XY to lie on a parallel line through the center, the very case Adam has already solved and is not asking about. This tells us that X and Y in the *general* case will always lie on *adjacent sides*. If I were drawing a figure, I would probably **work backward** (as suggested in the earlier post), first putting X and Y on adjacent sides, then finding the midpoint M. The resulting figure (you will see Doctor Rick’s version soon) represents the state of things *after* the problem has been solved, when an arbitrary point M had been chosen. I would then look for relationships that could be used to construct M if it were not already there.

Adam responded with an appropriate figure, but without drawing XY as I just suggested, and without making any real progress:

Ok, see the attachment for the example, no matter what I should or should not know, what is the simplest way of finding the geometric construction for X and Y? I know how to find them if A lies on the diameters [diagonals] or parallels, that is simple, but not when it is randomly placed. Maybe I am just missing something obvious though.

Here is the point at which it can be hard to help: He is acting a little impatient, asking for “the easiest way” to find the solution; but we still don’t know what he or his friend knows that could be of use – what solution *they* will most easily find given their background. Any hint we give will be pushing them along a path that might not be natural to them, and not helping them learn how to find solutions *without* help.

With hindsight, I might have suggested drawing the figure in reverse as I said above, which doesn’t give much of a shove in any particular direction; Doctor Rick went one step beyond that, showing that figure with an additional line that reveals a powerful idea:

Hi, Adam. Yes, what you’re missing is fairly obvious, once you see it. Maybe the attached figure will be sufficient.

He chose X and Y arbitrarily (using Geogebra, which is an excellent tool for experimenting with geometric figures), constructed the midpoint A, and then drew in AL to the nearest vertex of the square. This reveals two (provably) isosceles triangles, showing that X and Y are the same distance from A as L, and leading to the simple construction Doctor Rick mentioned at the start … if you see it!

Oh, so the line |LA| works as the arms for two isosceles triangles?

And there can be even two results, right, if the point A is closer to the center then there can be a line going from L through A, and further through the square?

Adam saw the immediate implication, but didn’t mention how to construct it.

Doctor Rick responded:

Yes, AX = AY = AL. Do you see the one step needed for the construction?

I don’t see what you’re thinking in regard to “a line going from L through A and further through the square.” How would that line (LA extended) relate to an alternate pair of points X and Y?

I don’t have a formal proof that there is only one solution when A is not on what you call the “parallels,” but I’ve pretty well convinced myself of that informally. There is plenty more to talk about in relation to this problem, so feel free.

Adam wanted to move on:

Yeah, my bad, you are right, it would have to lie in the middle of the square for that to be correct.

If you have time, there is a similar problem, it would be best if you hint the way like last time, even though you pretty much solved it for me, for which I am very grateful …

But there was more for Doctor Rick to say:

Hi, Adam. You’re right, in the end I practically gave it away, and I would prefer that you do more of the thinking. If you’re satisfied with the state in which we left that problem, OK, but as I said, there is more that both of us could think about! For one thing, you didn’t answer my question about the one step needed to do the construction. That isn’t hard at all, so I hope you just didn’t bother to say it.

Adam still didn’t quite state what the construction was:

Yes, I see the construction step, by connecting the point A to the nearest apex, you find the length of AX and AY, because it creates a double triangle sort of construction, I am not sure how I would be able to solve the problem without prior knowledge of such a property though.

Doctor Rick finally stated the answer directly:

I don’t think you do see the construction step. It is to

draw a circle with center A, passing through L(or whichever vertex of the square is closest to A);the intersections of this circle with the adjacent sides of the square are X and Y. Perhaps you are not familiar with classical construction, or perhaps that is not really the form of solution you need.It is not clear to me what you mean by “without prior knowledge of such a property.” Are you saying simply that you wouldn’t have thought of this construction on your own? That’s why I should have given you less in the way of hints, so you’d have to exercise your mind more — that’s what homework is for.

But in fact Adam had seen the construction:

Oh yeah, I did that in my construction, I just didn’t realise that is what you were asking haha, I explained to my friend, that she has to use a compass, put the point in A and draw a circle from L around, the points of conflict being the x and y, sorry for the confusion.

Drawing the circle is a beautifully simple answer; this is a very nice problem for demonstrating how to think creatively and find a way to do something that is not at all obvious at first.

(Now, the problem required finding *all* such lines, including the special cases, and also requires *proof*; so Adam is not finished. But we have good reason to believe he can handle the rest.)

After this, they were off looking at another interestingly simple problem, which I don’t have room for here …

]]>We occasionally got questions about Probability Distribution Functions (PDFs) from students who lacked a full picture of what they are; when I searched for references to give them, I never found one that explained the whole concept as I wanted to. When the following question came in, I took it as an opportunity to create that reference. This week, a student asked some questions far above her current knowledge, that touched upon this topic; I referred her to this page, and took note that it might be one to add to the blog as well.

Mike, in 2014, was looking at the subject from a fairly advanced perspective, knowing enough calculus to talk about it in detail; others, without calculus, write to us having been introduced to the normal distribution curve and the basic idea that “the area under the curve is the probability”, but not knowing anything more. I tried to aim my answer at a level that could help anyone.

PDFs Explained: From Histograms to Calculus What are the output values of the probability density function (PDF)? And how does the integral of the PDF yield the probability? I get confused thinking about the area as a probability. Look at the example of the odds of k heads for n flips of a fair coin. The output values are the corresponding probabilities, and inputs are k. I know that k goes to the z-score, but what about the probability outputs as n goes to infinity for the binomial distribution? Are they still thought of as probabilities? After all, the curve is approached as n tends to infinity. How is it that, when n tends to infinity, and the shape approaches the PDF, we can take the area to arrive at the probability? I have looked at different explanations of the development of the PDF. To a certain extent, I understand its theoretical development: I understand that this is the continuous case of the binomial distribution, developed from setting up a differential equation with some basic assumptions; I know that DeMoivre derived the curve from Stirling's Approximation and some other mathematical trickery. However, this is not very enlightening, as it is the continuous case of binomial distribution.

Mike is primarily referring to the normal distribution, which many people see even without ever being taught what a PDF is in general. Here is a good introduction:

https://www.mathsisfun.com/data/standard-normal-distribution.html

He had gone beyond that, seeing deep explanations of the meaning of the normal distribution, such as this one:

Deriving the Normal From the Binomial

But it seemed that the underlying ideas were obscured by the detail; he wanted to see the basic concepts, such as what the vertical axis of a PDF even means, and how area comes to be involved. I chose to start with the basics of what a PDF is, to put it all in context. I started with a basic histogram, which is easy to understand.

The normal distribution can be derived from many different starting points; the limit of the binomial distribution is just one of them. But I think your real question is about what a PDF means in the first place, and how it is related to histograms. In particular, what does the value of the PDF at a single point mean -- since it clearly is not the probability of that specific value, which would always be zero -- and why is it the AREA under the curve that gives a probability? Am I right about this? When histograms are first introduced, we tend not to present them in the form that is directly related to the idea of a PDF. In their more advanced form, histograms are all about areas. So let's develop that idea, starting with the most basic concept and moving toward the PDF. The most basic form of histogram is just a bar chart showing the frequency with which either discrete values, or values within "bins" or classes, occur. At this level, the bins are typically equal in width: Freq | 50| +-----+-----+ | | | | | | | | 40| +-----+ | | | | | | | | | | | | 30| | | | +-----+ | | | | | | | | | | | | 20| | | | | +-----+ | | | | | | | | | | | | | | 10+-----+ | | | | | | | | | | | | | | | | | | | 0+-----+-----+-----+-----+-----+-----+ 0-9 10-19 20-29 30-39 40-49 50-59 In this case, the total number of outcomes is 10 + 40 + 50 + 50 + 30 + 20 = 200.

Here is a nicer version of the histogram:

We can modify this to show the probability (relative frequency) of each bin, by dividing each frequency by the total of 200: Prob | | +-----+-----+ | | | | | | | | 0.2| +-----+ | | | | | | | | | | | | | | | | +-----+ | | | | | | | | | | | | 0.1| | | | | +-----+ | | | | | | | | | | | | | | +-----+ | | | | | | | | | | | | | | | | | | | 0+-----+-----+-----+-----+-----+-----+ 0-9 10-19 20-29 30-39 40-49 50-59 Now if you add up the probabilities of all bins, you get 0.05 + 0.20 + 0.25 + 0.25 + 0.15 + 0.10 = 1.00 as we should. And if we think of the width of each bin as being 1, this would make the total area of the bars 1, since the area of a bar is its width times its height. Even if we don't literally say the width is 1, at least the area of each bar is PROPORTIONAL to the probability of each bin, which is what we visually expect: the biggest (not just tallest) bar should go with the most likely event.

Again, here is a more readable version:

At this point, we have a discrete probability distribution. But we need to give it a little twist before we can move to the continuous case:

But sometimes you need to use bins of different widths. This is discussed at length here: Modal Class of a Histogram with Unequal Class Widths http://mathforum.org/library/drmath/view/72241.html If, as in the next example, we make a bin larger without making any other changes, we'll have a problem: Prob | +-----------+-----+-----+ | | | | | | | | 0.2| | | | | | | | | | | | | | | +-----+ | | | | | | | | | | 0.1| | | | +-----+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0+-----------+-----+-----+-----+-----+ 0-19 20-29 30-39 40-49 50-59 Here the height of the first bin is the probability of the value being in that bin, which is the sum of the probabilities of the first two bins in the original. But the area is now too large. That bar is as high as the next two not because each value is this likely, but because there are more values in that bin. We really want the histogram to look like this: Prob | ? | +-----+-----+ | | | | | | | | 0.2| | | | | | | | | | | | | | | +-----+ +-----------+ | | | | | | | | 0.1| | | | +-----+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0+-----------+-----+-----+-----+-----+ 0-19 20-29 30-39 40-49 50-59 Here, the height is the AVERAGE of the heights of the two bins I combined, so the area is the same. But what does the height mean now? It's the probability DENSITY, defined as the probability of the bin divided by its width, so that the AREA of the bin is the probability of the bin. I have to relabel the vertical axis. In my example, the width of the original bins is 10, so the probability density for them will be the probability divided by 10. I'll also switch over now from labeling the bins with ranges, such as "20-29," to just labeling them with boundaries. And for simplicity, I'll interpret 20-29 as meaning 20 <= x < 30, by thinking of the values as having been rounded down. Prob | dens | +-----+-----+ | | | | | | | | 0.02| | | | | | | | | | | | | | | +-----+ +-----------+ | | | | | | | | 0.01| | | | +-----+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0+-----------+-----+-----+-----+-----+ 0 20 30 40 50 60 We now have a fully consistent meaning for the vertical axis, and a fully flexible way to handle the horizontal axis. This is what a histogram REALLY is!

Here is the final result of that development:

In my comment above, “for simplicity”, I was ignoring an important step that is commonly taken in moving from ranges to boundaries on the horizontal axis, which prepares for the move to continuous distributions. This is the distinction between “class limits” and “class boundaries”, and the related “continuity correction”. We never archived a full explanation of this, but you can find some of the relevant ideas here:

Class Intervals in Statistics

Now we are ready for the step to continuous probability, though a full understanding of this level requires some knowledge of calculus:

Now imagine having a truly continuous distribution with an infinite set of data, so that we can take narrower and narrower bins. The meaning of the height will not change as we change the binning; and the area between any two values on the horizontal axis will still represent the probability of a value falling in that interval. This is what we call a continuous probability density function, or PDF: Prob | dens | 0.03| * | * * | * * | * * | * * | * * 0.02| * * | * * | * * | * * | * * | * * 0.01| * * | * * | * * | * * | * * |* * 0*-----------+-----+-----+-----+-----* 0 20 30 40 50 60 Putting it another way, the height represents the instantaneous rate of change of probability (increase in probability per infinitesimally small increase in an interval), so that the probability of an interval is the definite integral of the PDF -- that is, the area under the curve.

This curve represents a continuous distribution that might have produced the histograms we have been using; I tried to draw it so that the area under the curve between two values on the horizontal axis is the same as the area of the corresponding bars in the histogram (shown in the background):

At the end, I referred to some other sources, which go into more depth than I have, perhaps at the expense of clarity:

If you need more on either the meaning of a PDF in general, or on the relation of the normal to the binomial, see these pages: Normal Distribution Curve http://mathforum.org/library/drmath/view/57608.html The idea of a probability density function https://mathinsight.org/probability_density_function_idea Probability density function http://en.wikipedia.org/wiki/Probability_density_function

Apparently I guessed right about what was needed:

Thanks, Dr. Peterson. That was the exact explanation I was hunting for. The doctors of Dr. Math are brilliant at their explanations of mathematical concepts.

Thanks are always appreciated!

]]>First, if you are not quite sure what these terms mean, here is an introduction to all of them, which hints at the basic answer to our question:

Range, Mean, Median, and Mode I have some questions that you may want to answer for me: 1. Why do we have to study range, mean, median, and mode? 2. Could you help me understand them more? 3. How is it going to help me later in life?

Doctor Stacey explains what each is, and why it might be used; a key comment near the end is, “Now, mean, median and mode are all good types of averages, and each works best in different types of situations.” None of them is *universally* “best”; that’s why they all exist! What’s best depends on your *needs* in a particular *situation* (context). That will serve as a useful foundation.

Here is a typical question, from 2001:

Which is the Best Description? I am asked to answer the question; "Mode, mean, range, median:which best describes the number of times"Popular Song" was played on the radio per day?" given the following information: Sunday: 9 Monday: 13 Tuesday: 13 Wednesday: 12 Thrusday: 14 Friday: 3 Saturday: 10 I calculated that the range is 11, the median is 12, the mode is 13, and the mean is 10.5. How should this question be answered? Is there an answer? I would guess the answer to be mode, since 13 occurs twice in the list. But I don't know how to answer the question.

Here, there is a specific context, so at least Doctor Stacey’s ideas can be applied. But is there really a correct answer? Doctor TWE starts off with his own example:

Let's say we were comparing two basketball players, Anne and Rich. In five games they've made the following points: G# Anne Rich -- ---- ---- 1 12 10 2 0 13 3 13 9 4 25 8 5 0 10 When comparing means, Anne and Rich seem to be equivalent, since each averages 10 points per game. If we look at their median scores, Anne seems to be the better player; her median is 12 as compared to Rich's median of 10. If we consider their modes, the "most likely outcome" is that Anne will score 0 points, but Rich will score 10. Rich's mode is better than Anne's. Let's look at these numbers all together: Anne Rich ---- ---- Mean 10 10 Median 12 10 Mode 0 10 Who's better?It depends on what you're looking for.If, for example, you were to have an "office pool" on how many points Anne or Rich will score in the next game (and you had to get the score exactly to win the pool), you'd want to use theirmodes. If you had a pool and the winner was the one who came closest to the player's score, you'd want to use theirmeans. If you wanted to know what score you'd have to get to have a better-than-even chance of outscoring Anne or Rich, you'd want to use themedian(or more precisely, the median + 1). The numbers don't indicate a clear "winner," but they do paint a good picture of the types of players Anne and Rich are.

The point is that which is “best” depends not only on the *data* and their *meaning* (both of which we are given), but on how you will be *using* the result (which we are not). He continues,

My assumption is that the problem was not asked with a right or wrong answer in mind, but rather to see if the student canjustifyhis or her choice. Does it ask the studentwhyhe or she gave their answer? This type of question can be used to test the student's understanding of the underlying concepts rather than just their ability to "number crunch."

This should be posed as an essay question, in which the reasoning is more important than the specific answer. Unfortunately, Sandy replied that it was given as a multiple-choice question. Doctor TWE responded:

This is why I dislike closed form (i.e. multiple choice, true/false, etc.) interpretation questions. There are valid arguments that can be made for any of the first three choices, depending on why the information is being sought. But in lieu of more information, I would say that the mode *probably* best describes the number of times the song was played. As a teacher, I avoid using multiple choice except for "factual" recall questions. (E.g. "which of the following is the median of the data set?") When it comes to interpreting data and understanding concepts, I want to see what the student is thinking, and the only way to do that is with open-ended questions. (E.g. "which measure of central tendency best describes the number of times the song was played on the radio per day? Defend your choice.")

I imagine that all of the Math Doctors, whether we are teachers or not, would agree with him on the last point; but we may well give different answers to the question itself! (I personally think the mean is a better answer here.)

Here is a question from 2009, with less context to the question, and more thought from the student:

Choosing between Median and [Mean] as the Best Representation I recently did a quiz relating to measures of central tendency (mean,median, mode, range). On it was this question: The set of numbers are: 1, 3, 9, 10, 13, 15, 25, 39, 58, 63.Which measure of central tendency best represents the data?I thought it was the mean but my teacher thought it was the median. The mean of the numbers is 23.6 The median of the numbers is 14 There is no mode I am unsure as to what measure of central tendency best represents the set of numbers. I believe it is themeanbecause it factors in all of the numbers, including the very low and very high numbers, therefore it isnot distorted by any very high or very low numbers. My teacher disagrees and thinks is themedian. My teacher believes thatthe mean is distorted by the bigger numbers, however, I believe that the median is distorted by the smaller numbers.

Here we are given the values, but are not told what they represent, or what our goal is. But Henry has given thought to the *reason* for his choice (bravo!), and so has the teacher. Can we declare a winner? I didn’t:

I don't think there is a correct answer to the question. It depends on your point of view and the purpose of your measurement, as you stated nicely in your last paragraph. The question is, what do you mean by "represent"? For example, suppose you listed the annual pay for all the employees of a company. Themeanwould represent how much each would be paid if the total payroll were divided evenly among the employees. That might be the natural way for themanagerto summarize the pay, since thetotal payrollis important to him. It well represents the effect of all the salaries on the bottom line. But if a few have very high salaries while most are not paid much, the mean would make it look as if everyone earned a lot of money, "on the average".The mean would be too greatly influenced by those "outliers".Theemployees, on the other hand, might consider themedianto better represent their average pay, since it would show how much an "average" (typical) employee made (focusing on the individualrather than the bottom line). So which is most useful or important depends on your point of view; and each contains different information. The employees and the manager are both right; they just have different interests, and different ideas of what it means for an "average" to represent the data. (It just happens that the mean also favors the manager in making the company look generous, while the median favors the employees by making them look underpaid. But I don't think that's why they would tend to make the choices they would!)

(Sometimes people answer this question based on a somewhat cynical view that each would make the choice that favors his own position, and that is probably true in many situations. But I think it’s important to recognize that these are also *logical* choices, and both are objectively correct based on their interests.)

I applied this to Henry’s problem:

In your case, your teacher seems to be more like the employees, focusing on the individuals, while you are more like the manager, thinking of the whole. One way to clarify this is to diagram the whole distribution to get a better sense of how the numbers relate to one another. If there were more data I'd use a histogram, but I'll just use a "dot plot": o o oo oo o o o o +-------+-------+-------+-------+-------+-------+-------+ 0 10 20 30 40 50 60 70 ^ ^ median mean Both measures are very much in the middle; the median is more "in the middle" of the individuals, while the mean is more "in the middle" of the whole.

I pointed out earlier that the **mean is highly influenced by outliers**: that is, just changing the highest or lowest salary will change the mean, while it will have no effect on the median. Henry seems to be saying the opposite: “the mean … is not *distorted* by any very high or very low numbers” because it “factors in all of the numbers, including the very low and very high numbers.” I suppose he is thinking of the fact that every number *contributes* to the mean; but I wouldn’t say that eliminates “distortion”.

The difference lies in what each considers to be a “distortion”: “My teacher believes that the *mean* is distorted by the *bigger* numbers, however, I believe that the *median* is distorted by the *smaller* numbers.” Technically, this data set doesn’t have any outliers (isolated values far from the main body); but the relatively few high numbers do “pull the mean up”. On the other hand, the median is affected (directly) *only* by the middle numbers, so it really doesn’t make sense to say that small numbers distort it. Rather, I think he is saying that the concentration of many low numbers causes the median to be low. But using the word “distortion” assumes that there is some *right* answer that is being messed up, which is circular reasoning. Why shouldn’t the “average” be low, when many of the numbers are low? And why shouldn’t high numbers have an influence? All we really have here are two *different* ideas of average, not a right one and a wrong one!

Let’s see what I had to say about this:

My inclination is to agree with the teacher, if I had to take sides; the median also seems closer to what the mode would be if you were to group the data, since it is densest around 9-15. But I also notice that this is not quite the kind of situation I described for employees, where a very few numbers are far above all the others. It's hard to say that the many numbers clustered toward the low end "distort" the median (which is where most of the numbers are, anyway), or that the numbers smeared out toward the high end "distort" the mean (which is not so far away from the median).

The fact is that the median is closer to more of the data, and in that sense it represents the data better. But here is an interesting grammatical point: The word “data” (taken straight from Latin) is technically a *plural*, and if you take it that way (as I did just now in saying “more of the data”), then we are focusing on the *individuals*, and the median is best. But today many people don’t know Latin, and take data as a *singular*, referring to the whole collection. And from that perspective, the mean may be better!

Nevertheless, as I closed, I agreed with Doctor TWE:

So I still don't really know what it means to "best represent the data" without a context! This sort of question works better as an essay topic than a multiple choice.

Here is one more example, from 2003:

Mean or Median? In working on finding textbook readability for a math project, we have to find the mean number of words per sentence. One question asks:Why does the formula use the meannumber of words per sentenceinstead of the mediannumber of words per sentence? When I found both the mean and the median, they had the same value, 13. They are both measuring the center, so what's the difference? Ithink it's supposed to have something to do with outliers.

This turned into a long discussion with Doctor Ian about how each is affected by outliers, and under what conditions one is larger then the other. I will not quote at length, but he starts with a salary example like mine (emphasizing that the mean might be used just to give a good impression, but both are valid). Then he corrects a misunderstanding of the phrase “more resistant to outliers”, and points out some practical reasons why the mean might be preferred (namely, easier calculation). Then he gets to Jill’s question about why the mean and median are about the same for her data (it’s often a result of symmetry), and shows how either could be greater (depending on how the data are skewed). Finally, Jill shows her actual data set, which turns out to have an outlier at each end, and otherwise fairly symmetrical.

We never quite get to a direct answer to the project’s question: “Why does the formula use the mean number of words per sentence instead of the median number of words per sentence?” I have no idea what the answer might be; you’d have to ask the author! I probably would have wanted to ask for the entire statement in the project, to make sure I knew what “the formula” is, and where it came from. If it is referring to something like the Flesch readability index, I suspect Doctor Ian’s suggestion is right: **the mean is just easier to calculate** for a large piece of text, because you only have to count words and sentences. In my experience, that seems to be the *usual* reason for using the mean!

From time to time we get a question that is more about words than about math; usually these are about the meaning or origin of mathematical terms. Fortunately, some of us love words as much as we love math. But the question I want to look at here, which came in last month, is about *both* the word and the math; in explaining why the word is appropriate, we are learning some things about math itself.

Here is Christine’s question:

Why is an equation like 2x + 4 = 10 called a

linear equation in one variable? Clearly, the solution is apointon one axis, the x-axis,not a lineon the two-axis Cartesian coordinate plane? Or are all linear equations in one variable viewed as vertical lines?

Clearly, she is saying, the phrase “linear equation” means “the equation of a line”. And there is a similarity between the linear equation in *one* variable above, and a linear equation in *two* variables, such as \(y = 2x + 4\), which definitely is the equation of a line. But with only one variable, the only way to say that \(2x + 4 = 10\) is the equation of a line is to plot it on a plane as the vertical line \(x = 3\). Is that the intent of the term?

No, it goes further than that, because you also have to consider three or more variables. I initially gave just a short answer, to see what response it would trigger before digging in deeper:

The term “linear”, though derived from the idea that a linear equation in two variables represents a line, has been

generalizedfrom that to mean that the equation involves apolynomial with degree 1. That is, the variable(s) are only multiplied by constants and added to other terms, with nothing more (squaring, etc.).So you could say that the term has been taken from one situation that gave it its name, and applied to more general cases with different numbers of variables. A linear equation in three variables, for example, represents a plane.

From my perspective, “linear” means far more than “having a graph that is a line”. My first thought when I see the word (outside of an elementary algebra class) is “first-degree polynomial”. Although one initially connects it to straight lines, when we extend its use (and almost every idea in math is an extension of something simpler), the idea we carry forward is the degree, not the number of dimensions. For example, here is the beginning of the Wikipedia article about linear equations:

A linear equation is an algebraic equation in which each term is either a constant or the product of a constant and (the first power of) a single variable (however, different variables may occur in different terms). A simple example of a linear equation with only one variable, x, may be written in the form: ax + b = 0, where a and b are constants and a ≠ 0.

Although this article is about linear equations in general, they start with one variable, not with lines. And although they show the graph of a line, the second paragraph skips over the two variable case, right to three variables:

Linear equations can have one or more variables. An example of a linear equation with three variables, x, y, and z, is given by: ax + by + cz + d = 0, where a, b, c, and d are constants and a, b, and c are non-zero.

But why would we call any polynomial equation with degree 1 “linear”, when it is only in two dimensions that it is related to a line? In the one-variable case, as Christine said, the graph is really a **point**; and in this three-variable case it is a **plane:**

Christine wasn’t quite convinced:

Thank you so much. I am very particular with terminology when I teach mathematics. I have to say, I do not like this generalization of the term. I think it is misleading.

Hmmm … *is* the term “linear” misleading? Not to mathematicians, and I would hope not to students once they get used to it. Yet it’s true that it doesn’t quite mean what it seems to say. And *is* generalization bad? I think it’s the essence of what math *is* – and also an integral part of languages,which are always extending the meaning of words to cover new needs.

Here is my response:

Hi, Christine.

Thanks for writing back with additional thoughts. Let’s think a little more deeply about it.

First,

this is standard terminologythat has been in use for 200 years by many great mathematicians, so we should be very careful about considering it a bad idea. I don’t think you’re likely to convince anyone to change; the word is used not only of alinearequation in itself, but of “systems oflinearequations” (in contrast to “non-linearequations”); of the whole major field of “linearalgebra”; and for related concepts like “linearcombination”, “lineartransformation”, and “linearindependence”, all of which apply to any number of variables or dimensions. So the term is very well established with a particular but broad meaning, and (at least after the first year) no one is misled by it. We know what it means, and what it means is more than just “line”.Second,

what would you replace it with?If you don’t want to use the word “linear” except in situations that involve actual lines, what word would you use instead to describe the general class of equations involving variables multiplied only by constants, regardless of the number of variables? We need a word for this bigger idea; that word will either be a familiar word whose meaning is stretched to cover a bigger concept, or some made-up word. The tradition in math has always been to take familiar words and give them new meanings (eithermore specific, like “group” or “combination” or “function”, ormore general, like “number” or “multiplication” or “space”). So what we observe here is found throughout mathematics: a word that has grown beyond its humble beginnings. (This is also true of the entire English language!Mostwords would be “misleading” if you thought too deeply about their origins.)I could say that, in a sense,

all of mathematics is about generalization(or abstraction). I just mentioned “number”; some people do complain about calling anything other than a natural number a “number”, but the logical development from natural numbers, to integers, to rational numbers, to real and complex numbers, involves repeated broadening of the term, which has been extremely useful. We invent new concepts, and give them old names because they are a larger, more powerful version of the old concept.

I mentioned how common it is in English for a word to grow beyond its original meaning. Sitting here at my computer, I look at the *mouse* – is it misleading to call it that when it doesn’t have legs, and may not even have a tail? And the computer has a *screen*; at one time, a screen was a flat surface that *hid* something that wasn’t to be seen (or that kept out bugs); then it was applied to flat surfaces on which pictures were projected; and then to a surface that *shows* a picture itself. Is that misleading? It would be if you went back a hundred years …

And thinking again about math terms, I’m reminded of this discussion where I pondered what all the different operations that are called “multiplication” have in common.

But as I wrote this, I realized that I had gone off in a different direction than Christine, and I wanted to relate my answer to her specific context, linear equations in one variable. The real question was pedagogical: How could she explain this to her students, so that (eventually) “linear” would mean to them what they will need it to mean? I continued:

Now, I’ve been mostly thinking of the “enlarging” development of the word “linear”, taking it to

morethan two dimensions; you’re thinking specifically of the term used withonly onevariable. So it may help if we focus on that, to fit your particular context. I’d like to explain linear equations in one variable in a way that should make it clear why we use the word, and that it is not a misnomer.Consider the equation you asked about, 2x+4=10. The left-hand side is an expression; we call it a

, because if you used it in an equation with two variables, y = 2x+4, its graph would be a straight line. So we call 2x+4 a linear expression (or, later, a linear function). A linear equation in one variable is one that says that two linear expressions are equal (or one is equal to a constant).linearexpressionOr, to look at it another way, one way to solve this equation would be to graph the related equation y = 2x+4, and find where that

lineintersects the line y = 10. So the linear equation can be thought of very much in terms of a line.(If this were a student’s first exposure to linear equations in one variable, she likely wouldn’t have seen graphs of lines yet, and wouldn’t be ready for this discussion; either the word linear would be used without explanation yet, or we would hold off on the word until later.)

Does this help?

It’s often hard to be sure what kind of answer will help; but Christine gave the answer I hoped for:

Yes, your response does help very much. In addition, it gives me a lot to think about. It is true, the grade level I am currently teaching does learn to solve linear equations in one variable before they learn about linear equations and functions. Perhaps this is why I question the use of the term. I must remember that I have had experience in mathematics beyond their years, and therefore, am more thoughtful about what terms they are exposed to and more importantly in what sequence the mathematics is presented to them. I thank you again for such an in depth response and will certainly give this discussion much more thought. It is a pleasure speaking with you.

I imagine if I were introducing these equations to students with no exposure to graphs of lines, I might just mention in passing that these are called “linear equations in one variable”, and that we would soon be seeing why the word “linear” is appropriate. For now, what it means is that all we do with the variable is to multiply it by a constant, and to add things. Kids are used to hearing things they’ll understand later …

]]>An interesting question that has been referred to many times since it was written in 1999 deals with averaging angles. At first the question seems trivial; then almost impossible; and then we end up with a rather simple formula that is totally unlike what we started with. And further applications lead to new issues that make it continue to interest us.

Here is the question:

Averaging Two Angles How do you take the average of two or more angles? The average of 179, 180, and 181 is 180, but the average of 359, 0, and 1 is not 180.

Dave poses a seemingly simple question, but briefly states what troubles him about it. Averaging is averaging, right? The average of 179, 180, and 181 is obviously 180, and the calculation (if you need it at all) is easy: \(\frac{179 + 180 + 181}{3} = \frac{540}{3} = 180\). Unfortunately, because he mistyped his counterexample (possibly confusing it with the average of 359 and 1 alone), it took a little discussion before I saw his point. All I did initially was to show how to find the averages, so he could compare that to what he did:

Hi, Dave. To find an average, or mean, of a set of numbers, you add up the numbers and then divide by the number of numbers you added. In your examples, there are three numbers, so you divide the sum by 3: 179+180+181 540 ----------- = --- = 180 3 3 359+0+1 360 ------- = --- = 120 3 3

He was right that the second answer wasn’t 180, but why was that important? It seems that he wrote 180 where he meant 120, and the problem he saw was that 120 didn’t make sense as the answer. He had to clarify:

Right, but that does NOT take the average of three ANGLES. The average of the three angles in the second example should be 0.

Ah! So if you have angles 359°, 0°, and 1° and draw them together, they are all 1 degree apart, just as 179°, 180°, and 181° are; but in the latter case averaging yields the angle you expect (the one in the middle), but in the former it doesn’t. I started to see the point:

I missed your point because you'd written 180 instead of 120. I think it might be helpful if you gave me some context - why do you want the average? The meaning will be different in certain kinds of problems, depending for instance on whether you are looking at the angles as aturn through 359 degrees, or as adirection 359 degrees from north. As you stated the question, I would still just average the numbers, as I did. If something turns, say, 359 degrees to the left in the first hour, stays stationary in the second hour, and turns 1 degree to the right in the third hour, then on average it has turned 120 degrees per hour. Presumably there is a reason for giving the angle as 359 rather than as -1 degree; in the latter case, of course, the average would be zero. In this sense, there is a difference between -1 and 359 degrees.

Angles can be thought of in several different ways. In geometry, we think of an angle as a mere **figure** composed of two rays, or as the (unsigned) “distance” between them. In trigonometry, we think of it as a **rotation** (clockwise or counterclockwise): a (signed) “motion”, so that different motions can result in the the same final geometrical configuration (coterminal angles). In navigation, we think of it as a **direction** (relative to north), and do not distinguish coterminal angles.

In terms of **rotations**, it makes perfectly good sense to average angles just as we average numbers. In the following picture, to make things more visible, I have changed the angles to 350°, 0°, and 10°; if we start at A, rotate clockwise 350° to B, then 0° (staying at B), then 10° to C, we have rotated a total of 360°, and on average we have rotated 120° per move.

If the first rotation were -10°, the total rotation (and therefore the average) would be 0°.

But Dave is not thinking in terms of rotations! He is thinking of **locations** on a circle, or directions, or something of that sort:

Here, we just have a point A at 350° from north (measuring angles as in headings in navigation), B at 0°, and C at 10°. Clearly, the average in this sense ought to be 0°. From this perspective, 350° is the same as -10°; signs and rotations are not the issue, just where you end up. So now I could start pondering the problem Dave had posed:

But you've raised an interesting question: if we think of the angles just as directions, then 359 and -1 should mean the same thing, and the average should not depend on how I state it. This is an inherent problem with angles, and this ambiguity shows up in other places, notably in working with complex numbers, where for example we divide an angle by 3 to find the cube root, and there are actually three answers 120 degrees apart, much as in this problem. Let's look at your problem in terms of wind direction. Suppose the wind is blowing at a constant speed, but one day it blows from 359 degrees, the next from 0 degrees, and the next from 1 degree. What is the average wind direction? If the answer isn't 0 degrees, something's wrong.

Now I have a model for what Dave might intend to do with the average. (He never did tell me, though others later did state specific applications.) Thinking about what it would mean to average directions, I decided that the most reasonable would be in terms of vectors: Add the n unit vectors representing each direction, and divide the resultant vector by n.

Here I have added vectors **a**, **b**, and **c** by putting them end to end, and the resultant vector, OE, clearly is in the direction of B as expected. (Dividing by 3 has no effect on the direction, so I don’t really need to do that for our purposes.)

I first explained with reference to the three directions in the question:

I would approach this in terms of vectors. The wind velocities can be represented by vectors V1 = (V cos(359), V sin(359)) = (0.99985V, -0.0175V) V2 = (V cos(0), V sin(0)) = (1, 0) V3 = (V cos(1), V sin(1)) = (0.99985V, 0.0175V) Now if we add these vectors and divide by 3, we get (V1+V2+V3)/3 = (2.99970/3, 0) = (0.99990, 0) and this vector has direction tan^-1(0) = 0 degrees as expected. We can also see that because the wind turned, the average strength in this direction is a little less than 1. If you haven't seen vectors or trigonometry yet, this may suggest how valuable they are!

Dave wasn’t familiar with the notation \(\tan^{-1}(x)\), which is also called \(\arctan(x)\), the inverse tangent, but he was satisfied with the answer as meeting his needs.

Four years later, another visitor, Larry, read this answer, and wrote to ask why he wasn’t getting the right answers. This happens fairly often!

I tried using the method you give; however, I get a close answer, but usually not an exact answer. For instance when I average 5 degrees and 15 degrees, the average should be 10 degrees, but instead I obtain 9.814 degrees. I used the scientific calculator that comes with Windows Office Professional software, which is accurate to 33 places. Being almost 0.2 degrees off is significant as compared to the correct answer. Having a calculator that has tables correct to 33 places makes me believe that the answer should be more accurate than this. Maybe there is a problem with the tables in the calculator that I am using. If so, is there calculator that is more accurate?

The issue can’t be mere accuracy; he must be doing something wrong. I took the occasion to rewrite the formula without explicit reference to vectors, to make it easier for others, and then checked that these calculation gave the correct result:

The method given [on that page] can be put into a formula this way: sum of sines of angles A = arctan ------------------------ sum of cosines of angles For your example, we have sin(5) + sin(15) 0.08716+0.25882 0.34597 ---------------- = --------------- = ------- = 0.17633 cos(5) + cos(15) 0.99619+0.96593 1.96212 A = arctan(0.17633) = 10 I presume you did something different; if you want help to see what you did wrong, please tell me how you tried to apply the method there. I should mention for completeness that the arctan (inverse tangent) always gives an answer between -90 and +90 degrees, so if the angles you are working with are outside that range, you will have to adjust, by finding the angle in the correct quadrant that has the indicated tangent.

I don’t know what he did to get his wrong answer; he never responded. I tried doing the calculations with my calculator in radian mode (the usual culprit in this sort of formula), and got an answer of 0.575 radians (32.9°). This may look wrong, but in fact, if we add 3π to 0.575, we get 10 radians, showing that this answer is coterminal with the expected average. (In radians, 10 and 15 are rather large, outside of the usual range.) But this is not what Dave got. Nothing else I tried gave 9.814°. So this is a mystery. (Let me know if you figure it out.)

A couple months later, though, another visitor, Stephen, wrote to say

If you're writing a program or spreadsheet to do this, atan2 is a better choice than atan, as it gets the signs correct automatically.

This was a helpful suggestion for anyone programming this formula. Computer languages commonly provide this special function that takes x and y coordinates and finds the angle to that point directly, rather than through \(\arctan\left(\frac{y}{x}\right)\), which as I had mentioned doesn’t always get the quadrant right. So I wrote up a final version of the formula:

Hi, Stephen. That's true, though the person who wrote was using a calculator. If you were using a programming language or spreadsheet program that supports the "atan2" function, for which atan2(x,y) = arctan(y/x) with the appropriate sign according to the quadrant of the coordinates, then a better formula would be A = atan2(sum of cosines, sum of sines) Be careful, however: the definition above is for Excel; in C++, it's atan2(y, x), with the order of arguments reversed. The function is not quite standard everywhere. For more about arctan and atan2, see: Arctan and Polar Coordinates http://mathforum.org/library/drmath/view/54114.html

This last link demonstrates how to get the right quadrant if you don’t have atan2, and then briefly mentions the function. It should also be mentioned that in any programming language these functions [almost] always work with radians rather than degrees.

Among others who later wrote to us about this (and went unarchived for one reason or another), a couple were looking for a way to average wind directions, and one wanted to average the direction of a plane (in order to ignore small deviations from course and record the overall direction). The latter, I felt, really should just average angles as numbers, with some adjustment when crossing from 1 to 359. The former probably did need the vector method, but needed to recognize that there can be cases where the average (of 0° and 180°, for example) is undefined, because the vector sum is 0, which has no direction.

Bottom line: if you want to average directions specified as angles (such as headings or bearings), in Excel, you can use this formula: **A = atan2(sum_of_cosines, sum_of_sines)**.

One of these later writers, Nick, wrote in late 2003 (unarchived),

In http://mathforum.org/library/drmath/view/53924.html you discuss two methods for averaging angles. The first method is to calculate a simple arithmetic average, eg: (5 + 15) / 2 = 10 [You need a special rule so that, eg, the average of 1 and 359 is (1 + -1) / 2 = 0. That's OK.] The second method is to treat the angles as vectors, eg: atan (sin (5) + sin (15) / cos (5) + cos (15)) = 10 So, for two angles, the two methods seem to be equivalent. Interestingly, for more than three angles the two methods seem to result in similar, but not identical results. So, for example, if we want to average 5, 13 and 15 the first approach gives 11, the second approach gives 11.00244215 (using Excel, or the windows calculator). This difference seems too large to be just a numerical artifact, so I am curious as to how to interpret the two different results!

I replied,

This doesn't surprise me much; it just reflects the difference between linear and circular geometry. I would expect the two to be different. As I explained, different ways of averaging are appropriate in different contexts. Thinking of the angles as directions, you want to use the vector approach. Here you are working in a fully two-dimensional setting, and want to add the directions as vectors. But if you are just thinking of, say, angles as positions on a circle, you are using the circle as if it were merely a curved ruler, and they will add like ordinary numbers. I played with this a bit and found that the difference between the two methods increases when the angles are less symmetrical, like 0, 80, and 90 as opposed to 0, 45, and 90. With two angles, you are always symmetrical, and both methods will give the same answer. Try drawing the vector addition graphically, and you will get a sense of why there is a difference.

He replied,

You make an interesting observation that the results diverge as the angles become less "symmetrical". Further experimentation shows that the results are essentially identical if the differences between the angles are small - eg 20.0, 20.1 and 20.5 give the same result by either method. A colleague suggested that the fact that the sine of a small angle is approximately equal to the angle itself is somehow involved. Perhaps I should attempt a proof that the two methods are equivalent in this case (although my trigonometry is not that great). Anyway, the context in which I was interested is in surveying, in which you may wish to repeatedly measure an angle subtended by three points, and calculate the mean and standard deviation. In this case, the angles to be meaned have a range of only a few seconds, and so the two approaches would seem to be equivalent.

My response:

The comment about the sine is exactly what was in my mind, though it's hard to express it clearly. It is this similarity and difference -- that the sine, or chord, is very close to the angle, or arc, but not quite -- that is behind the subtleties of trisecting an angle; people often think they have found a way to do it, but they are really trisecting a chord, or something like that, so that it is very close for some cases but not for others. Something like that is happening here. Certainly your observation with small differences between angles is true: in the limit, both methods will be equivalent. And I think you are right that you can use either method in your situation. You could probably analyze the situation more closely and might find a third method that better fits what you are doing, but it wouldn't be worth the effort!

It’s always interesting to see what others do with the ideas we put out there.

]]>