First, from an anonymous student in 2002:

Negative 5 and Minus 5 Can you tell me the difference betweennegative fiveandminus five? When should I use negative? When should I use minus? If I have a lot of numbers, e.g. +3 ,-7, -3/4, -5.4, 1/4....... How to read them? Thank you!

Rather than give a routine answer, Doctor Achilles started with a bit of history:

Thanks for writing to Dr. Math. That's a very good question. It's very easy to get negative 5 and minus 5 confused becausepeople often say "minus 5" when they should actually say "negative 5."I'm going to answer this in a sort of roundabout way that I hope will help get the point across. A long, long time ago, people thought only in terms of thenatural numbers: 1, 2, 3, 4, 5, ... You can do a lot with the natural numbers. You can count how many sheep, dollars, children, and friends you have. You can count how many apples you have. You can even do simple arithmetic: if you have 6 apples and youlosehalf(divided by 2), you end up with 3 apples; if you have 6 apples and youdoublethe number (times 2), you end up with 12 apples; if you have 6 apples and you get twomore(plus 2), you have 8; if you have 6 apples and youlosetwo (minus 2), you have 4. The idea ofsubtraction(or minus) has been around since the days before fractions and integers, when people only thought in natural numbers.

Before there were negative numbers, there were operations. Subtraction was read as “minus”, which is Latin for “less”, another term that is sometimes used: “nine *minus* five” or “nine *less* five”. (Adults don’t say “nine take-away five”.)

If I say "nine, five" to you, all I have done islisttwo numbers: 9, 5 I haven't told you anything about what todowith those numbers. The most you can do is just remember them. You don't know whether I want you to multiply them, divide them, add them, or just remember both of them (one could be the number of oranges I have and the other the number of cousins I have). If I say "nineplusfive" to you, then I have told you toaddtwo numbers. You don't have to remember 9 and 5, you just have to know that the answer is 14.

The point is that the words “plus” and “minus” represent **operations**.

Somewhere along the line, some people came up withzero(and that was a big deal) and some other people came up withfractions(and that was a big deal). Then some other people came up withintegers. Integers are all the positive (natural) numbers, and zero, and thenegativenumbers: ..., -3, -2, -1, 0, 1, 2, 3, ... It's hard to think of apples and sheep in terms of negative numbers, but they are useful in thinking about money and other things. For example, right now, my bank account has a negative amount of money in it.Negative numbers are just a kind of number, not really any different from positive numbers.

The word “negative” describes a **number**.

If I say "nine negative five" to you, all I have done islisttwo numbers: 9, -5 I haven't told you anything about what to do with those numbers. The most you can do is just remember them. You don't know whether I want you to multiply them, divide them, add them, or just remember both of them (one could be the balance in my friend's bank account and the other the balance in my bank account). If I say "nine minus five" to you, then I have told you tosubtract5 from 9. You don't have to remember 9 and 5, you just have to know that the answer is 4. So "negative5" is anumberand "minus5" is a mathematicaloperationyou can do to another number. If it helps, you can think of "negative 5" as a noun, as in the sentence "negative 5 is my least favorite number"; and you can think of "minus 5" as a verb as in the (ungrammatical) sentence "I want you to take 9 and minus 5 it."

Unfortunately, people do often use “minus” in place of “negative”; my dictionary even gives the example “minus ten degrees”. It can be hard to keep straight!

One other thing. This tends to get complicated when you start doing crazy things like: "negative 5 plus 6" or "8 minus negative 5" If you want, you can check out this page for my suggestions of how to deal with this: Tips for Negative and Positive Numbers http://mathforum.org/library/drmath/view/57873.html I should note thatI'm a bit sloppywith the distinction between "minus" and "negative" on that page.

He isn’t very sloppy; he just uses “minus” to refer to the *symbol*, however it is being used.

He closed by giving the words for +3 ,-7, -3/4, -5.4, 1/4 in the question:

For the last part of your question, the list you gave reads: "positivethree" OR "three" "negativeseven" "negativethree-fourths" "negativefive point four" "one fourth" OR "one quarter" Hope this helps. If you have other questions about this or anything else, please write back.

The student did write back:

Dear Dr. Math, I think your answers are great help to me. Thanks again. Can you answer another question? I'm reading a book called _Exploring Mathematics_. It says "Weather-forecaster: The temperature fell to 'minus five degrees'." Is it right? Thank you. Best regards!

Doctor Achilles replied,

In English it is not uncommon to hear "negative 5" called "minus 5." This istechnically incorrect, but if you're not speaking with mathematicians, then it'sacceptableto just use "minus 5."

Everyday language is not precise.

Our second question, from Chris in 2005, is about the symbol:

"Subtraction" and "Negative"--Same Sign, Different Concepts? Is there a difference between the operation "-" when used in the expression "y = a - b" as compared with its meaning in the expressiony = -b? I don't doubt the truth of the statement 0 = -1 - (-1) but the reasoning I have seen has never quite convinced me, because it seems that the - sign isbeing used to mean two different things. On a number line these two things are something like 1) -b => Rotate the following number by 180 degrees. 2) a - b => Continue (or count) in the direction of the following number, rotated by 180 degrees. Now while these are fairly similar they are not the same thing. Taking another approach, we are allowed say "y = a × b" but we are not allowed to say "y = × b".So "-" is being used in a "syntactically" different way to "×".This indicates to me that we are allowing "-" to be used to mean two different things. I think that, as it happens, the rules for combining the two different meanings of the sign allow us to get away with saying "-(-) = +" without distinguishing between the two, because it just happens to work out like that. The reason I'm so hung up on this is that I think if wetemporarily used new signs- say,p and n, for positive (no rotation) and negative (180 degree rotation)--and continued to use+ and -for the operations of counting on or counting with rotation, it all becomes a lot easier to explain what is actually happening. That is, "pa - pb" can be shown to be the same as "pa + nb" and also it is much easier to grasp that "pa - (nb) = pa + pb". Is there anything to all this?

We often think of the negative sign just as part of the name of a number (which is how Doctor Achilles used it in his answer above); Chris rightly sees it as an **operation** (the “negative of” that we talked about last time). But that means we are using one symbol to represent two different operations. Would it be helpful to use two different symbols?

I answered:

Hi, Chris. You're exactly right that "minus" and "negative" aretwo different operations; that's why they have different names (and we _try_ to get kids to use the right names ;-). "Minus" is abinary operation(performed on two numbers), while "negative" is aunary operation(performed on a single number). They have the same symbol because they are _very_ closely related, and it doesn't cause any trouble to use the same symbol.

This is, ultimately, the same idea Doctor Achilles was emphasizing: Minus and negative are different things. But now we need to explore how they turn out to be related so closely that, even though we use **two different words**, we can use the **same symbol**. And distinguishing them will help us to see this!

However,some texts have done just what you suggest, and used different symbols for them; one common choice is araised "-"to mean negative, so an equation might look like _ a - b = a + b This does help some students to get a better sense of the distinction before they move on to the normal notation.

This notation looks like “\(a-b=^-\!b\)” or “2 + ⁻3 = ⁻1”. It isn’t the only notation that’s been used.

You may recall that two weeks ago we saw that DeMorgan represented negative numbers by a bar over the number, as seen here (p. 46):

His \(\overline{4}\) wouldn’t work so well as an operation, but was a reasonable attempt to distinguish negation from subtraction. The same idea has been used in balanced ternary notation, e.g. \(10\overline{1}_{bal3}=1\cdot 3^2+0\cdot 3^1+(-1)\cdot 3^0=8\).

Cajori mentions several mathematicians who used each of these notations:

I might also add thatscientific calculatorshaveseparate keysfor the two operations; on those that most closely mimic written math notation, the "negative" key is commonly labeled "(-)" to distinguish it from the "minus" key, "-".

Here are two such calculators, with green “minus” and yellow “negative”:

Other calculators, in which operations apply to the number in the display rather than being entered as they would be written, use a “change sign” button instead (red here):

Computer programming languagesdon't need to make that distinction, since they can determine the meaning from context; I'm not sure why calculators don't do the same, but it's probably due to little differences in the way users think of keys on a calculator vs. characters on a page.

Here’s one difference between calculators and programming, that explains the need for the separate button: In a program, each statement has a definite start, so if you type, say “– 2 – 3”, it is clear that the initial “–” is a negative sign, because there is nothing before it. But in many calculators, if you enter “– 2 – 3”, it will assume you are continuing a calculation using whatever was previously calculated, so that there is in fact something before the sign; some, for example, will display “(ans) – 2 – 3”. If the display previously showed 0, the result will be what you expected, but otherwise you will be subtracting 5 from the previous result. The separate (–) button prevents this from happening.

Aside: A similar issue arises with students, who may confuse \(a-b\) with \(a(-b)\), especially in larger expressions. How can you tell whether “–” is a subtraction, or multiplication by a negative? This was discussed near the end of Order of Operations: Subtle Distinctions; the answer, as I just explained for computers, is this:

If an expression *can* be interpreted as a subtraction, it *is* a subtraction. For example, \(2-3\) is a subtraction because the “–” is between two numbers; but \(2(-3)\) can only be a multiplication, because there is no number on the left of the “–”. To show multiplication by a negative number on the right, the parentheses are *required*.

On the other hand, there are some programming languages with very different syntax, in which the distinction is needed. I vaguely recall the “high minus” symbol (raised negative) from my long-forgotten introduction to APL (one of the first languages I learned, in the 60’s); it is used like DeMorgan’s bar, not as an operation, but as an integral part of the numeric “literal”.

Regardless of whether we use distinct notations for the two operations, it is important todistinguish the operations themselves. The negative is called the "additive inverse", and is defined by a + -a = 0 That is, the additive inverse of a is the number you can add to a to get 0. Having defined that, we actually _define_ subtraction in terms of this: a - b = a + -b That is, wedefine subtractionas _meaning_addition of the additive inverse; so, as you said, subtraction means "turn around on the number line and walk the given distance in the opposite direction". This is the connection between the two operations, and the reason it is helpful to use the same symbol. When I read "a - b", I see it as "a + -b", because addition has important properties, such as commutativity, that subtraction lacks, making it very useful to forget about subtraction and use only addition.

This definition of subtraction as “adding the opposite” can’t be used before students have learned about negatives, or in contexts dealing only with natural numbers. There, we define subtraction as the inverse operation of addition. That is, the difference \(a-b=c\) is the (unique) number \(c\) such that \(a=b+c\). Subtraction “finds the missing addend”. This produces the same result as adding the negative because, using the definition I gave above, we can see (1) that \(c=a+-b\) satisfies the definition: $$b+c=b+(a+-b)=b+(-b+a)=(b+-b)+a=0+a=a$$ and (2) that if \(a=b+c\), then, adding \(-b\) to both sides, $$a+(-b)=(b+c)+(-b)=(c+b)+(-b)=c+(b+-b)=c+0=c$$ so that \(c\) is unique.

Once we have the concept of additive inverses (which necessitates knowing about negative numbers), algebra becomes far easier. Think about the fact that algebra existed before negative numbers were accepted, so that (as we saw two weeks ago) we needed separate cases for everything!

(Note that if we used a very different symbol, like your "n", it would be harder to see the connection and to learn to see it this way: it is not obvious that a - b = a + nb.)

This is the benefit of “raised minus” notation: It’s enough like regular minus to see the connection, yet distinct enough to help students.

There is a similar issue with regard to multiplication and division. If we wanted, we could use the division symbol "/" to indicate themultiplicative inverse(also called the reciprocal): a */a= 1 We define division as multiplication by this inverse: a / b = a */bin much the same way as for subtraction. For some reason this has never caught on, as far as symbols are concerned. Similarly, although we can use "+" as a unary operator (which has no effect on a number, as +a = a, meaning 0+a), we don't happen to use multiplication, "*", in the same way, so that *a = a, meaning 1*a. It wouldn't hurt to do so, but has not been found useful!

In reality, rather than this “raised division”, \(a\div b = a\times ^\div\!b\), we use a negative exponent to indicate a reciprocal: \(a\div b = a\times b^{-1}\). This could be used, for example, in Order of Operations: Common Misunderstandings.

Getting back to negatives, the fact that -(-a) = a (effectively, turning around twice and ending up in the same direction) allows us to easily rewrite -1 - (-1) as -1 + -(-1) = -1 + 1 = 0 This is much harder to explain in terms of subtraction only; by seeing subtraction as adding the negative, it becomes relatively simple. In summary, your thoughts are valid, and not really new; seeing things this way is actually essential to learning algebra well. So you're in good company!

Chris responded:

Fantastic! To some small degree I have been hung up on this for years and despite moderate searching I have never come across any of these texts, so it is quite a relief for me to find that it is a valid way of thinking. Thanks again for your time.

I did some searching of my own, hoping to find examples of the raised negative in modern materials; but at the time that was hard to find; the only source I found no longer exists. It is easier to find references to it today; one reference I find to it, here, says

A simple and useful example is the use of two different “minus” signs in school (at least in the USA), the normal “-” sign for subtraction, and the raised minus sign for indicating a negative number, ⁻7 for negative seven. This notation, typically in use up to 8th grade, allows the text books to write 8 – ⁻7 rather than having to write 8 – (-7), for instance. Teachers typically think of this negative number notation as a kind of training wheels.

Wikipedia mentions it (here) without citation. American teaching standards don’t mention it. But I find examples of teaching materials using it from the U.S. (1972), the U.K. (expand *Use negative numbers in context*), Australia (e.g. pp. 14-18), New Zealand, and even the Philippines. (If you find better examples, let me know.)

Chris responded to what I showed him:

Thanks for this. Visually, the look of the expressions which mix the raised "-" and normal "-" are exactly right for me and express the meanings quite intuitively. The distinction between 5 - 6 and 5 + ^-6 is clear, as is the fact that they give the same result. Regards, Chris.]]>

This question from 1998 asks about translating words to signed numbers:

Converting Words to Numbers Can you solve this problem? Tell me anintegerto describe each situation: 1. 5 degrees below zero 2. a loss of 7 pounds 3. a gain of 10 yards 4. positive twelve 5. 3 feet below sea level 6. 2 degrees above zero Tell me theoppositeof each integer: 7. -8 8. 9 9. -15

I answered:

Hi, Cambree, The first set of questions deals with what we call "conventions"; that is, people who use math have generally agreed that certain "directions" should be "conventionally" thought of as "positive." Anything that is "up" or "forward" or "increasing" or "more" ispositive, and anything that is "down" or "backward" or "decreasing" or "less" isnegative.

In general, “positive” is taken to mean “in the direction we expect” while “negative” means “the opposite of that”. So we need to know what is “expected” in a particular context, which is commonly implied by the words used.

For example, if I make aprofitof 5 dollars in my business, I would call that +5, and if I lose 5 dollars, that would be -5. Why? Because then whichever happens, I canaddthat number to my bank account to find out how much I have now. Similarly, if a mountain's base is 2 milesbelowsea level and its peak is 3 milesabovesea level, then the altitude of its base is -2 and the altitude of its peak is +3, so the total height is (+3) - (-2) = 5 miles.

“Positive” tends to correspond to adding, “negative” to subtracting.

On the other hand, as I said, these are just conventions, and they really depend on what you're measuring. If I were in a submarine measuringdepth, I would say my depth is +2 miles, because when I think about "depth" I mean something thatincreasesas I godeeper. So adepth of +2means the same thing as analtitude of -2. Therefore, the answer to these questions should include some sort of label. For instance, I would say "altitude = -3" for problem 5. You could also say "depth = +3" if you want to confuse your teacher, but then you'd have to bring me in to testify on your behalf, so maybe you'd better stick with -3.

I have too often seen textbook questions that are ambiguous because it isn’t clear which perspective to take.

You should be able to do the rest by looking for words like "below" or "loss" to indicate a negative number. Or look on your thermometer and see what they call a temperature below zero.

Let’s answer them all, since the problem was due long ago:

- 5 degrees
**below**zero =**-5**degrees - a
**loss**of 7 pounds =**-7**pounds “gained” - a
**gain**of 10 yards =**+10**yards gained **positive**twelve =**+12**- 3 feet
**below**sea level =**-3**feet in “altitude” [but +3 feet of depth!] - 2 degrees
**above**zero =**+2**degrees

I put the negative numbers’ units in “scare quotes” because you would not normally call, say, a loss of 7 pounds a gain; as I said above, this is contrary to expectations. The main use of such numbers is in formulas where a **variable** might be defined as a gain, and the negative value represents a loss. The fact is that, if we made a formula concerned with weight loss, we might actually expect a loss, and call a gain a negative loss. It all depends on what we call the quantity.

As for the last three questions, "opposite" just means to flip the number line around (stick a pin at the zero and give it a spin) and see where you land. The opposite of -3 is +3, and the opposite of +3 is -3: ------------------+----------+----------+-------------------> -3 0 +3 flips around to give: <-----------------+----------+----------+-------------------- +3 0 -3 so that -3 is where +3 belongs.

Just as in the words we use, “negative” means “opposite”. In fact, “opposite” is not used as a technical term with this meaning in math; many mathematicians would object to this very question. Instead, we say “the negative (or additive inverse) of -3 is +3”. But as we’ll see next, that can confuse students, so it is often avoided at first.

Answering the remaining questions,

7. opposite of -8 = +8

8. opposite of 9 = -9

9. opposite of -15 = 15

I hope this helps. Negative numbers aren't hard, but if you're confused, just remember thata few hundred years ago people thought mathematicians who talked about negative numbers were crazy. Let me know if you need more help.

We saw last week that even some mathematicians less than 200 years ago thought the idea was nonsense!

This is from later in 1998:

Solving -x = 3 Please tell me how to get the answer to-x = 3. I have no idea how anegativecanequal a positive.

We use “negative” in two different ways: a **negative number** is one that is less than zero; the **negative of a number** is the

Doctor Roya answered, avoiding this confusion by sneaking in the back door:

Dear Tyler, Thank you for writing to Dr. Math. We are going to try to solve this problem backwards. By that I mean, we are going to look at the answer first and work our way back to the original problem. Let's look at the following example:3 is a positive number. If we multiply 3 by -1, we get -3, which is anegative number. We say that-3 is the opposite of 3.

In fact, the symbol “\(-3\)” means “the negative *of* 3″, which is the same idea as “opposite”. The negative of a positive number is a negative number.

In the same way,3 is the opposite of -3. How, you may ask? Well, just do the same thing. Multiply -3 by -1. We get (-1)(-3) = 3. If we drop the 1 in this new equation (we are allowed to drop the 1, since 1 times any number is that same number), we end up with: - -3 = 3 This is becoming more familiar, isn't it? Now, if we replace -3 with a variable (unknown) x, we end up with: - x = 3 which looks just like your problem.

Again, the negative *of* a negative number is positive.

So this just happens to be the solution: If \(x=-3\), then \(-x = 3\).

Going back to your question: If we read the equation (-x = 3), it says: We have a number x, that when we multiply by -1 the result is number 3. What number does x represent? Remember that (-1)(x) = 3 is really the longer way of writing the same equation. I hope that you now see how x must be a negative number. In fact x is equal to -3. Please keep in mind thatx stands for a number. That number could be apositivenumber or anegativenumber. x is just a place holder. Please write back if you still have questions.

In solving this equation, we can just divide both sides by \(-1\), the coefficient of *x*. Or we can just say we are taking the negative (opposite) of each side: $$-x=3\\ -(-x)=-(3)\\ x=-3$$

Here’s a question from 2001 clarifying the definition of “negative”:

Is Zero Positive or Negative? Is zero a positive or a negative number?People consider it neither. I don't get it.

Two of us answered; first, Doctor Jaffee:

Hi Beth, In order for a number to bepositive, it has to belarger than 0. So, 0 is not positive. In order for a number to benegative, it has to besmaller than 0. So, 0 is not negative. Therefore, 0 isneither negative nor positive.

Doctor Roy got his answer in less than two minutes later:

Let's reconsider what it means to be positive or negative. To bepositive, a number must begreater than zero. To be negative, a number must be less than zero. If you read these two definitions carefully, you notice that the number zero is not covered by either case. So, the concept of positive and negative includes almost all the numbers, but it does not consider zero.Zero is considered special. By definition, it is neither positive or negative. Think of the number line as split into three parts: negative, 0, and positive, as follows: -5 -4 -3 -2 -1 0 1 2 3 4 5 <---|----|----|----|----|----|----|----|----|----|----|---> <------ negative------------>0<-------------positive------> I hope this helps, and please feel free to write back.

Zero is the border between positive and negative. A number that is “non-negative” can be either positive or zero.

This question, from 2002, raises one more important distinction:

Is Zero a Signed Number? Please tell me, Dr. Math, Is zero adirected number? What is thedirectionof zero? If zero is not a directed number, why do we use it in the set of directed numbers? Example: When we place directed numbers in line: -4, -3, -2, -1, 0, +1, +2, +3, +4

I answered after checking this relatively unfamiliar term:

Hi, Intisar. Probably you are using "directed numbers" to mean something like "signed numbers"; it appears that this is a British school term for what we would properly call "integers" or "real numbers," with the emphasis on the presence of a sign. It is meant to be descriptive, not a precise definition, and I think you may be taking the term too seriously. I would say that "thesetof directed numbers" refers to numbers in which we areallowingboth a length and a direction (sign). In the case of zero, the direction is meaningless, since +0 and -0 are the same; but that does not mean that it has no sign, only that the sign makes no difference.

As I mentioned last week, “integers” doesn’t just mean “signed” but also “whole”; mathematicians don’t commonly need the terms “signed” or “directed”, but they are useful when you first encounter the concept of sign.

When we write “-0” or “+0”, we are *writing* a sign, which would not be allowed if we were treating it as unsigned. We are not saying that 0 in itself *has* a sign (that is, is either negative or positive), as we just saw.

The important point about the set is not that each member of the set_must_have a significant "direction," but thatdirections are _allowed_, so that the set does not consist only of positive numbers. That is, no claim is made that you can separate out the size and direction for every such number, so that each number (including zero) shouldhavea specific direction; rather, numbers in the set arebuiltby combining a "size" and a "direction" (sign), and there is nothing wrong with both +0 and -0 turning out to be the same number.

A similar issue arises with **vectors**, which are described as having both length and direction; there, too, the zero vector doesn’t actually have a direction.

Moreover, I would not even say that any particular numberis"a directed number"; rather, a number like 1 (or 0) may betreatedeither as a mere number (by children who have not yet learned about negative numbers, or when only positive numbers make sense), or as a "directed number" in contexts where signs are meaningful. It is really only "thesetof directed numbers," or "operations ondirected numbers," that are significant, not the individual numbers.

In computer programming, a variable can be specified either as a signed or an unsigned number; this affects how it is stored (in effect, whether there is room for a sign), and what can be done to it.

If we tried to formally define the "integers" (directed whole numbers) or the "real numbers" as numbers that combine size and direction, we would have difficulty in stating clearly what we mean. But if you are using "directed numbers" just to indicate that you are working withnumbers with (optional) signs, and not as a formal definition of a set, then there should be nothing wrong with accepting that zero belongs in this set.

I’ll close with a question from a teacher in 2003 that touches on several of the topics we’ve seen both here and last week:

Talking About Zero, Absolute Zero and Negative Numbers Zero is a tricky number to explain, especially when children are interested inwhy we have negative numbers. It is difficult to conceptualize taking something away from nothing. It becomes even trickier when thinking about absolute zero. Are there any useful strategiesother than temperatureto introduce negative numbers, and also to teach about zero, its properties, and how they differ fromabsolute zero?

I answered, starting with “absolute zero” and temperature:

I'm not sure what you are asking aboutabsolute zero; I'm only familiar with that term as a temperature, and as far as numbers themselves are concerned, absolute zero is just zero. The only special thing I see about it is how it fits into the scale: in the case ofKelvin temperature, we have a scale with adefinite starting point, andnegative temperatures do not exist(though I understand that this is not really quite true if you dig deep enough into the physics of temperature!). On the other hand, in scales where no (known) lowest value exists,negative numbers have to be allowed, and zero becomes not an absolute end point, but amere reference pointalong the scale from which quantities are measured in both directions. If the concept of absolute zero temperature had not been discovered, then we could not have a Kelvin scale, and all temperature scales would have to, at least theoretically, allow for negative temperatures.

Temperature is not only a good place to see negative numbers, but also a model of the difference between applications that allow them, and those that do not. Here are the Kelvin, Celsius, and Fahrenheit scales for comparison:

Kelvin starts with zero as the lowest possible temperature, so in principle it is a set of **unsigned numbers**. But Celsius puts zero at the freezing point of water, and Fahrenheit at the lowest temperature obtainable in the lab at the time, making them sets of **signed numbers**. The latter both *allow* negatives, though not *all* negatives are possible.

There are somesituations where only positive numbers make sense, and in those cases we have the equivalent of a Kelvin scale, with an absolute zero. For example, a person'sheight can't be negative; no one can be less than zero meters tall! Butaltitude, which on the surface sounds the same, does allow negatives; I can be 100 metersabovesea level, or 100 metersbelowsea level (a negative number). Again, we can find analternative scale for altitudethat has an "absolute zero", namely thedistance from the center of the earth; so altitude referenced to sea level is just a convenience for people who live near sea level (as Celsius is convenient for people who "live" in the zone where water is liquid). Butnot all scales can be made absolute; coordinate systems for space offer no (known) "farthest left" location, so negative numbers are required no matter what you do. Which leads into my next comment ...

Here are the two altitude scales (using miles, rounded), with zero at the center of the earth and at sea level, respectively:

The International Space Station is at 4250 miles from the center, but 250 miles above sea level.

Apart from temperature, I think the only good way to introduce negative numbers is with anumber line(of which temperature is just a familiar example we can point to in a child's environment, especially when they live in a cold climate). If you look at what we say about negative numbers, such as Positive and Negative Integer Rules http://mathforum.org/library/drmath/sets/select/dm_pos_neg.html you will find a lot of references to the number line. The basic idea is that, if we want to locateevery point on a lineby associating it with a number, positive numbers just aren't enough! In order to put any numbers on the line in numerical order we have to have a zero point from which we start counting; and in order to label points in both directions, we need negative numbers. Once you have that idea of negative numbers as labels forpoints to the left of zero on a number line, everything else falls into place.

Positive numbers (actually, non-negative numbers!) are appropriate for measuring on a **ray**, which goes in only one direction; for a **line**, or anything else that can extend without bounds in either direction, we need negative numbers to be available.

After building that view of negative numbers, you canreturn to counting situationsand ask whether negative numbers ever make sense there. That's when ideas of "owing" or "debt" arise as uses for negative numbers. But probably those ideas won't be clear without the number line model to make the concept concrete. In a sense, we are thenmodeling counting in terms of the number line, extending the idea of number to allow for negatives that result from subtracting more than you have. Such modeling is really the essence of mathematics.

This is the idea we closed with last week.

Next time, I’ll look at two more questions that would have made this post too long; they are about the difference between “negative” and “minus”. How are negative numbers and subtraction related, and how are they different?

]]>

First, we have a question from 1998, from a child who didn’t yet know about negative numbers:

Introduction to Negative Numbers Dear Dr. Math, I want to knowwhat does 11 - 12 equal?Sincerely, Jacob

I answered:

Hi, Jacob. That's a good question. It's easy to say what 12 - 11 is, so shouldn't 11 - 12 have an answer too? Until recently, my own 6-year-old daughter would have written 11 - 12 = C, which stood for "Can't do it." For some questions, that's the best answer you can give. Suppose you had11 carrotson your dinner plate, and because I love carrots so much, while your head is turned I try totake 12 carrotsfrom your plate. I can't do it!

My daughter is now working in a school (but not teaching math)! She invented this notation for herself; then I explained negative numbers to her.

Sometimes only positive numbers make sense, and for some problems “no solution” is the correct answer. But sometimes we can get past this barrier:

But suppose you had11 dollarsin your bank, and you wanted to buy a game that costs 12 dollars. You can't do it, but I might be kind enough to lend you a dollar. Then you would spend all 11 of your dollars, and you wouldoweme another dollar. You would actually haveone dollar less than nothing, because as soon as you earned another dollar, it would go to me and you would have nothing!

We’ve put an IOU in the bank:

This will “cancel out” the next dollar put into it:

For a similar way to think about “negative apples”, see Picturing Negative Numbers.

Several hundred years ago mathematicians realized that there were a lot of problems they could solve if they hada way to talk about numbers less than zero. So they decided to write 0 - 1 = -1, which is read as "negative one" and means "one less than zero." From what I just said about owing a dollar, you can see that: 11 - 12 = -1 (when you spend 12 dollars, you owe me one) and -1 + 1 = 0 (when you earn another dollar, you will have nothing)

This idea actually arose before the modern use of symbols; they would have written \(-1\) as “m.1”. The point is that they had a concept of negative numbers.

There's actually another place where you can see negative numbers very easily. Look on athermometer, and you'll see that the temperature can go down to zero degrees, but then it can keep getting lower. Temperatures below zero are written as negative numbers, like -10 degrees which means 10 degreesbelow zero. If the temperature now is 11 degrees, and it gets 12 degrees colder, it will be -1 degrees.

We’ll see this example a lot; rather than **counting** negative objects, we are **measuring** positions along a line that goes in both directions, which makes more sense.

Next, from teacher Mary Ellen in 1999:

Using Integers I needreal-life examples of integersto show my 7th grade students, to demonstrate the fact that they need to be able to use them in math class. I've mentionedscoresin some games,yardagein football, and the balance page of anaccountant'sbook. Can you give me some more? Thanks.

Doctor Rick answered this:

Hi, Mary Ellen, welcome to Ask Dr. Math! I presume you're talking in particular about places where we use numbers that may beeither positive or negative, and it's really helpful to be able to treat them as the same kind of number, instead of needing a separate rule for each combination of negative and positive numbers. Some examples aren't really integers but positive or negativerealnumbers.

We’ve seen a lot of people confuse the concepts of “integer” and “signed number”, presumably because of the way they are often introduced together in school. **Integers** can be positive or negative, but the primary meaning of the word is “**whole**“: They don’t include fractions. But even fractions can have **signs**. For more, see What is an Integer?

How abouttemperatures? Conversion between Fahrenheit and Celsius on a sub-freezing day is a good exercise in using negative numbers.Elevationsgo negative in places like Death Valley and the Dead Sea. Comparing the base-to-peak heights of Mount Everest and Mauna Loa is an exercise in subtracting negative numbers.Latitude and longitudeare easier to work with if you take east/north as positive and west/south as negative. The most obviously useful calculation here is in working withtime zones- you'll find charts with time zones labeled EST = -5, Eastern Europe = +2, etc.

All of these are measurements along a line, up and down or left and right.

Yearsare a peculiar case. AD and BC were invented around AD 525, before negative numbers (or zero) were really understood, sothere was no year zero. The year after 1 BC was AD 1. If it had been done right, we would have been able to compute the years between 43 BC and 33 AD as 33 - (-43) = 76. But because dates aren't integers, you have to say 33 + 43 - 1 = 75. In other words, when the calendar was devised, people just accepted the need forspecial cases, but with the invention of integers, we found a better way.

For a fuller discussion of this, see 2020 and the Y0K Problem.

By the way, the federal governmentdoes not assume that taxpayers understand integers, so the 1040 form uses thespecial-caseapproach: "If line 64 is LESS than line 59, subtract line 64 from line 59 and write it in line 65. If line 64 is GREATER than line 59, subtract line 59 from line 64 and write it in line 66." (That's the idea anyway, I got my form yesterday but I don't have it in front of me.)

Here is an actual example:

On line 34 a positive number is an *overpayment*, while on line 37 a positive number is an *underpayment*. In each case, a positive amount of money has to be transferred.

This is probably a wise decision on the part of the Internal Revenue Service, since many people would make mistakes with negative numbers.

I can't think of any everyday cases where we multiply negatives by negatives. But when your students learn aboutquadratic equations, they will be benefiting again from the no-special-cases property of integers. Early developers of algebra had to present solutions for 6 kinds of quadratic equations: http://www-groups.dcs.st-andrews.ac.uk/~history/Mathematicians/Al-Khwarizmi.html Your students will learn a one-size-fits-all solution method.

The link says:

He first reduces an equation (linear or quadratic) to one of six standard forms:

1. Squares equal to roots [e.g. \(x^2=10x\)].

2. Squares equal to numbers [e.g. \(x^2=16\)].

3. Roots equal to numbers [e.g. \(x=10\)].

4. Squares and roots equal to numbers; e.g. \(x^{2}+10x=39\).

5. Squares and numbers equal to roots; e.g. \(x^{2}+21=10x\).

6. Roots and numbers equal to squares; e.g. \(3x+4=x^{2}\).

Don’t let the IRS get its hands on the quadratic formula!

This is from 2002:

Practical Applications of Negative Numbers I am preparing a unit on operations with negative numbers for a class of very bright and accelerated fifth graders. I would like them to have some of the theory and background on negative numbers other than the obvioustemperatureandcheckbookexamples. What is thehistoryof negatives? Why were they invented, considering that you can't have -2 of a given object?Can they be used to count anything?What have they been used for? I would like to go deep, so any guidance you have would be most appreciated. Thanks.

I answered, starting with the history:

Hi, Michael. This site, listed in our FAQ under Math History, should be your first source for questions on history: MacTutor Math History Archive http://www-history.mcs.st-and.ac.uk/ In particular, you will find some relevant material under the history of zero: http://www-history.mcs.st-and.ac.uk/HistTopics/Zero.html Check the link onBrahmaguptafor some further details on the earliest use. TheChinesealso used negative numbers very early, writing them in black and positive numbers in red. Negative numbers were not taken seriously, though, until the time ofCardanandStifel, whom you can look up there; and it was not until some time later that negative numbers were given "equal rights" with positives, so that one did not need to know ahead of time whether a value was positive or negative. We have some discussion of this topic here (check out the links): Negative Number History http://mathforum.org/library/drmath/view/52593.html

The “equal rights” idea means that until then, negative numbers were not considered valid answers, just a trick to use in the middle of a problem. We’ll be seeing that this idea continued long beyond Cardan (1500’s).

To answer your specific questions, I would say that negative numbers were inventednot to countanything (unless you think of "counting" a debt, which is how Brahmagupta described the concept), but in orderto make it much easier to work with equations. The best example I can think of, however, is beyond your students, unless you just show them the problems without going into the solutions. This is the solution ofquadratic equations. Until the use of negative numbers, each kind of equation had to be treated separately: x^2 + ax = b x^2 - ax = b and so on. Once it was found that negative and positive numbers could be handled in the same way, the same methods could be used forallquadratic equations, since all could be expressed in the same general form by allowing variables to be have either sign: x^2 + ax + b = 0 (Note that if a and b are positive, this has no positive solutions, so they would not even have considered it.)

The Hindu mathematician Brahmagupta (600s) solved at least the first form; later authors such as Bhaskara (1100s) did allow negative coefficients and even (to some extent) negative solutions. Uniting all these forms into one meant there could be one method, and one quadratic formula.

In my mind the most significant use of negative numbers is incoordinate systems. We can locate points going both left and right, up and down from an origin, which would be impossible using only positive numbers. We would have to find an origin for which all points of interest were on the same side. So allowing negative numbers frees us up to describe things that happen anywhere in space, such as orbits of planets or graphs of equations. (Similarly, by allowing negative temperatures, we don't have to use a scale that starts at the lowest temperature we can observe; that's how theFahrenheit scaleoriginated, as a way to avoid negative numbers.) So really thenumber lineis the central concept in working with negative numbers. Without them, we can't name all the points on the line, but only "half" of them.

Imagine having to fit everything you are working with into one quadrant of the coordinate plane!

But Fahrenheit’s scale, in which 0 was the lowest temperature they could achieve in the lab using salt and ice mixtures, was just as reasonable as the IRS decision to use only positive numbers.

Here’s one from 2004:

Why Do We Learn about Negative Numbers? Why do we have to learn about negative numbers? The only context where I can see it being useful is in determiningtemperature. My child is just being introduced to negatives and both he and I are having difficulties with the subject.

Doctor Ian took this:

Hi John, Strictly speaking, there's nothing we can do with negative numbers that we couldn't do without them. It's just thatthey make some things easier--in particular, they relieve us from having to worry aboutspecial caseswhen dealing with subtraction. For example, suppose we're talking aboutlongitudes. We could always talk about "degrees east" or "degrees west". But then we have to have a bunch of rules for subtracting longitudes, e.g., a east - b east = (a - b) east if a > b = (b - a) west if b > a But if we just decide that east is positive and west is negative, then we can just subtract, a - b = whatever and let the sign keep track of the direction of the result, using a few simple rules.

This is very much like the IRS version, isn’t it? Signed numbers unify cases. We do often use directions in stating latitude and longitude, as in N38.889°, W77.035° or even N38°53′ 20″, W77°2’6″; but in calculations you would use (38.889, -77.035).

But the main reason for having to learn about them is thateveryone _else_ is learning about them, and so if you don't learn about them, you won't be able to understand what other people are talking about in a wide variety of situations--which is kind of like moving to a foreign country and deciding that you don't really need to learn the language.You can live that way, but you miss out on a lot.Does this make sense?

John replied,

Thank you very much for your prompt and very informative answer, this has clarified the situation for us.

Finally, a great question from 2008:

Negative Numbers in Real Life I'm finding it very difficult to imagine negative numbers outside of math. I've read some examples of their use here and elsewhere, but I'm still flabbergasted. I understand negativetemperatures, because they're based on the temperature of water. The temperature of frozen water is 0°C. Anything below is negative. But measurements are a concept. My trouble is with more concrete things. Mathematically, I can subtract 6appleswhen I only have 5. I can't do this in reality. I can use negative numbers in math, but outside of measurements, I can't really grasp them in the real world. For example, I can't have negativemoney. After I'm bankrupt, I can't spend any more money (without going to a bank or getting a credit card). But in the "world of math", I can take $600 from $500 and get -$100. I'm looking for an example where negative numbers work in the "real world" without measurements.How can the math be accurate but impossible?

I answered this one:

Hi, Cody. One thing that can help is to realize thatmathematics is a world of its ownthat can be used to MODEL things in the real world, but, like any model, is not IDENTICAL to that real world. Negative numbers are part of the "model world", not the real world. So there aresome situations where negative numbers make sense(and a negative answer to a problem is valid), andothers where they do not(so that a negative answer just means there is no solution to the original problem). In the case of temperatures, the 0 point (except in absolute temperature) is arbitrary, so that SOME negative values are possible, but others are not.

That is, when we apply math, we are “mapping” the problem into abstract models that match the way the real world works only up to a point. Then we “map” our answer back into a world where some answers just mean “can’t do it”.

In the case ofmoney, a negative answer may or may not be valid. If you are just spending money from a basic checking account, a negative balance means that you are overdrawn--but it DOES still have meaning, because you nowowethat much money to the bank. If you have an account with automatic overdraft protection, the negative balance means that you have borrowed that much and have to pay it back. Sohow to interpret the negative result depends on the situation; often positive and negative are just two sides of the same coin, each with its own interpretation.

So sometimes positive and negative mean “gained” vs. “lost”, and other times negative money means you did something wrong.

The idea of negative numbers was often considered suspect even into the 1800's. I've read a book by a mathematician of that time trying to present algebra in a way thatdidn't treat negative numbers as real; he called themfictitious, and presented them as just a shorthand for operations that should properly be done in reverse. But he admitted that using negative numbersmade the work easier, always gave theright answer(when interpreted correctly), and unified what would otherwise have requiredseveral cases(depending on which number is greater, for example). That is, negative numbers serve asa good, though imperfect, MODEL--you don't have to recognize the existence of the negative numbers themselves as concrete entities in order to make good use of them. (And, by the way, even counting numbers are really an abstract concept too--you never saw a "three" by itself, did you? ALL numbers live in the math world, not the real one.)

This book was Augustus DeMorgan’s *Elements of Algebra* (1837), which you can read here:

After 20 pages of explaining how to “correct” “impossible” operations, he says,

We don’t follow his line of thinking today (if anyone else ever did); but he demonstrates that negative numbers work so well that even if you don’t consider them real, it makes sense to *pretend* they are! The idea of seeing signed numbers as an abstract concept independent of (though related to) the real world, has rescued us from this.

What we are doing when we use negative numbers is translating a real problem into a problem about, say,locations on a number line(which correspond to amounts of money you have or owe, say), solving that new problem, and then deciding how to translate the answer back into the real world problem. The negative numbers live in this separate world of math, andmay have various meanings or lack of meaning in the real world.

Cody wrote back:

Now THAT was a well put together answer. It's more clear now, thanks.]]>

It is not uncommon for students to ask about why they get different answers using different methods. Usually the answer is that the answers are really equivalent. This time, the answers really are different! This was partly the result of being taught an incomplete technique, omitting important cautions. And although the question is about calculus, the real issues all involve some tricky trigonometry.

Here is the question, from Shreshth, at the end of November:

I have to find the derivative of y = arcsin(2x * sqrt(1 – x^2)) .

I can take

x = sin θorx = cos θwhich will lead me to the answers

2/sqrt(1 – x^2)and-2/sqrt(1 – x^2)But

these derivatives are not equal. Is this possible?I have attached images for the procedure to solve using x = sin θ:

Thanks.

Shreshth

This is not the usual method for differentiating, which would not involve such a substitution; rather, the substitution is meant to simplify the work of differentiation. The answer obtained is correct; Shreshth didn’t show his own work that gives the *wrong* answer. Let’s do what I presume he did:

$$\begin{align*}y&=\sin^{-1}(2x\sqrt{1-x^2})\\ \text{ Putting }&x=\cos\theta\\ y&=\sin^{-1}(2\cos\theta\sqrt{1-\cos^2\theta})\\ y&=\sin^{-1}(2\cos\theta\sqrt{\sin^2\theta})\\ y&=\sin^{-1}(2\cos\theta\sin\theta)\\ y&=\sin^{-1}(\sin2\theta) = 2\theta\\ \text{ Putting }&\theta=\cos^{-1}x\\ y&= 2\cos^{-1}x\\ \frac{dy}{dx} &= -\frac{2}{\sqrt{1-x^2}}\end{align*}$$

This does indeed give the wrong sign. Can you see why?

Doctor Fenton first answered the explicit question:

A function cannot have two different derivatives.It is possible to be able to compute the derivative in two different ways, yielding two different formulas, but the two formulas must be equivalent. For example, if we differentiate y=sin(2x) by the Chain Rule and obtaindy/dx = 2 cos(2x) ,

and also write sin(2x) as 2 sin(x) cos(x) and differentiate using the Product Rule to get

dy/dx = 2 cos

^{2}(x) – 2 sin^{2}(x) ,then although the two results look different,

they are actually the same, by the identitycos(2x) = cos

^{2}(x) – sin^{2}(x).

This is what we usually find when someone asks about getting two different answers. But the two answers Shreshth got *can’t* be the same. So what is happening? We’ll get to that later; but first, why is he using this method?

First of all, I don’t know

why you are using a substitutionto compute the derivative. The computation is straightforward, using theChain Rule. You write u = 2x√(1 – x^{2}) and use the Chain Rule to writedy/dx = dy/du * du/dx .

There is no reason to substitute a different formula for x.

The substitution seems to be used to verify an identity that

sin

^{-1}(2x√(1-x^{2}) ) = 2 sin^{-1}(x) for -1/√2 < x < 1/√2 ,so that on this interval, the derivative is just 2/√(1-x

^{2}) .

We’ll see the full work without substitution soon. And we’ll also be seeing that this identity is correct … on the given domain. But both substitutions are missing something essential:

Your two substitutions are not equivalent. Writing

x = sin(θ)when -1/√2 < x < 1/√2assumes that θ is in the fourth or first quadrant, so -π/4 < θ < π/4. In this interval for x, we can write x as sin(θ) for θ in the given θ-interval.

Why only in this interval? We’ll see more later, but ultimately it will be because of how we define the *inverse* function, whose range is taken to be \([-\frac{\pi}{2},\frac{\pi}{2}]\).In the following picture of the unit circle (where the *y* coordinate of a point is the sine), the red vertical line represents the domain of the inverse sine, the red semicircle represents its range; the blue line and sector represent the stated domain of *x*, and the implied domain of *θ*, respectively

Observe that on this interval, the cosine is positive; this will be important to know later. The important point for the moment is that there are other angles with sines in the given interval, so that the substitution as stated did not uniquely define theta.

But writing

x = cos(θ)when -1/√2 < x < 1/√2assumes that θ is in the first or second quadrant, so π/4 < θ < 3π/4, and the substitutions are not equivalent. When you replace √(1 – sin^{2}(θ)) with cos(θ), that isincorrect in the second quadrant. You need to take the negative square root in the second quadrant. The substitution of cos(θ) for in the given interval for x is not valid.

(I think Doctor Fenton is talking about the (valid) work that was shown, using the sine, not about the (invalid) work, using the cosine, where we actually replace \(\sqrt{1-\cos^2\theta}\) with \(\sin\theta\), which turns out to be valid. I only came to realize this while editing this post.)

Here are the domain and range of the inverse cosine (in red), and the implied domain of *θ* (in blue):

In the second quadrant, the cosine is negative, but the sine is positive. That’s why the radical is, perhaps accidentally, handled correctly. What’s most important, as we’ll see, is that *these angles are not in the range of the inverse sine* (which is the function explicitly used in the problem).

Doctor Fenton’s main point is that those complexities are unnecessary:

But mainly,

there is no reason to make such a substitutionto compute the derivative in this case. The result of using the Chain Rule as I described does lead to a somewhat complicated formula, but this does simplify to 2/√(1-x^{2}), since2/√(1 – u

^{2}) = 2/√(1 – (2x√(1 – x^{2}))^{2})= 2/√(1 – (4x

^{2}(1 – x^{2})) = 2/√(1 – 4x^{2 }+ 4x^{4}) = 1/(1 – 2x^{2}) ,since in the given interval for x, you need to write √(1 – 4x

^{2 }+ 4x^{4}) = (1 – 2x^{2}) to take thepositive square root.

Let’s carry out the whole work of differentiating \(\arcsin\left(2x\sqrt{1-x^2}\right)\). We’ll use the fact that \(\frac{d}{dx}\arcsin(u)=\frac{1}{\sqrt{1-u^2}}\frac{du}{dx}\) (by the chain rule), together with the product rule:

$$\frac{d}{dx}\arcsin\left(2x\sqrt{1-x^2}\right)\\=\frac{1}{\sqrt{1-\left(2x\sqrt{1-x^2}\right)^2}}\frac{d}{dx}\left(2x\sqrt{1-x^2}\right)\\=\frac{1}{\sqrt{1-4x^2\left(1-x^2\right)}}\left(\frac{d}{dx}(2x)\cdot\sqrt{1-x^2}+2x\cdot\frac{d}{dx}\sqrt{1-x^2}\right)\\=\frac{1}{\sqrt{1-4x^2+4x^4}}\left(2\sqrt{1-x^2}+2x\frac{-x}{\sqrt{1-x^2}}\right)\\=\frac{1}{\sqrt{1-4x^2+4x^4}}\frac{2(1-x^2)-2x^2}{\sqrt{1-x^2}}\\=\frac{1}{\sqrt{(1-2x^2)^2}}\frac{2-4x^2}{\sqrt{1-x^2}}\\=\frac{1}{1-2x^2}\frac{2(1-2x^2)}{\sqrt{1-x^2}}\\=\frac{2}{\sqrt{1-x^2}}$$

Doctor Fenton’s last comment needs emphasis, as this is where the restricted domain in the problem is used. Near the end of the work, we needed the take the square root of \(1-4x^2+4x^4\). Because we know that \(-\frac{1}{\sqrt{2}}<x<\frac{1}{\sqrt{2}}\), we know that \(0\le x^2<\frac{1}{2}\), so that \(1-2x^2>0\). Outside of the specified interval, we would need the negative sign to get the positive root, and the answer would be … what Shreshth got using the cosine!

Shreshth replied,

The reason I am using substitutions is because the school books are teaching those. So, I had to use those.

Thanks a lot for your help!

In light of this, Doctor Rick joined in, to correct specific errors in the method being taught:

Hi, Shreshth. I would like to add some comments.

If you are indeed being taught to use substitutions in finding derivatives, then

you will need to learn to avoid errors to which that method is prone. Could you tell us more about this substitution method? Under what conditions are you taught to make a substitution, and how do you decide what substitution to make? Could you show another example of this?

We’ll be seeing more examples, but not the general rules that are taught.

In the work you show, with the substitution

, there arex= sin θtwo errors.The first error is that

no interval is specified for θ, and therefore the substitution isnot well defined. For instance, ifx= 1/2, we could have θ = π/6 or 5π/6 (or any angle coterminal to these). Sayingθ = sinwould have eliminated the ambiguity, because the restriction is built into the definition of sin^{-1}x^{-1}, but we need to beawareof this, and take it into account in the work that follows.

When we make a substitution, we need to fully specify which values of each variable correspond to one another; the substitution must be one-to-one. Just saying “\(x=\sin\theta\)” doesn’t indicate which value of *θ* corresponds to any given value of *x*.

That is, though the problem specifies a domain for *x*, the work (presumably written by the teacher) never specifies a corresponding domain for *θ*, which sets us up for an error. If the inverse sine had been mentioned in the description of the substitution, rather than only being introduced later, this fault would technically not be present, but it would still be easy to forget the implications that follow.

The next error in the work shown is in

replacing √(cos. Unless we know that cos θ ≥ 0, we need to replace √(cos^{2}θ) by cos θ^{2}θ) by |cos θ| instead.

In fact (we’re currently looking at the provided work using the sine), for the appropriate values of *θ*, the cosine is in fact positive, so this step is valid. But it is necessary to pay attention to this; work that blithely takes the root without considering its sign is essentially wrong work, even if the result happens to be correct!

How about the alternative work, using the cosine? That, too is valid at this point. The sine is always positive in the interval of interest. But in other similar examples, there might be trouble.

These two errors together cause the situation that confused you. Doctor Fenton has already pointed these things out to you; I am just re-expressing them in terms of how to avoid errors in a substitution like this.

Actually, the real error in the work is a detail that hasn’t yet been touched upon! The key step in both solutions is $$y=\sin^{-1}(\sin2\theta) = 2\theta$$ But this is not always true. I discussed this in

Mixing Trig and Inverse Trig Functions

There I called this the “second Grant’s Tomb twist”: We are asking, “The sine of what angle is the sine of 2 theta?” But there are many such angles; what we are really asking, without realizing it, is, “The sine of what angle, **between \(-\frac{\pi}{2}\) and \(\frac{\pi}{2}\)**, is the same as the sine of 2*θ*?”

In the sine-substitution method, since we are told that \(-\frac{\pi}{4}<\theta<\frac{\pi}{4}\), we know that \(-\frac{\pi}{2}<2\theta<\frac{\pi}{2}\), which is in the range of the inverse sine, so it is true that \(\sin^{-1}(\sin2\theta) = 2\theta\). (Were you wondering why they choose that odd domain for *x*? This is why!)

But in the cosine-substitution method, we have to specify instead that \(\frac{\pi}{4}<\theta<\frac{3\pi}{4}\), and can conclude that \(\frac{\pi}{2}<2\theta<\frac{3\pi}{2}\), which covers the second and third quadrant. There, the inverse sine is *not* the same angle, but rather an angle in the first or fourth quadrant. This will be revealed in the graph we are about to see.

The fact that the work shown makes no reference to the domain of

x, while the problem did so, should raise a warning in your mind.You ignored this domain restrictionin your initial statement of the problem, but it turns out to be critical! When we graph the function you are differentiating, we discover that something important happens at the endpoints of the given domain:Notice the

sharp change in slope(derivative) atx= ±1/√2 ≈ ±0.707. If you complete the differentiation using the standard (chain rule) method that Doctor Fenton discussed, you will find that{ 2/√(1–x^{2}), |x| ≤ 1/√2 dy/dx = { { -2/√(1-x^{2}), 1/√2 < |x| ≤ 1which shows that your two derivatives

areboth correct, but fordifferent intervalsof the full domain, [–1, 1].

This is what I pointed out above, when I carried out the work just for the specified interval, but mentioned the sign change needed outside that interval.

What causes those sharp bends? Here is the same graph, of \(y=\sin^{-1}(2x\sqrt{1-x^2})\) (green), compared with the supposedly equivalent function, \(y= 2\sin^{-1}x\) (red dots), and Shreshth’s version, \(y= 2\cos^{-1}x\) (purple dots):

The inverse sine is restricted to \([-\frac{\pi}{2},\frac{\pi}{2}]\), so it “folds” either of the other two graphs back toward zero when they exceed the bounds. For instance, for \(\frac{\pi}{2}<\theta<\pi\), in quadrant II, \(\sin^{-1}(\sin\theta) = \pi-\theta\), as shown here, in an example using degrees:

And for \(\pi<\theta<\frac{3\pi}{2}\), in quadrant III, it is still true that \(\sin^{-1}(\sin\theta) = \pi-\theta\), as shown here:

As a result, here is the actual graph of \(y=\sin^{-1}(\sin\theta)\):

Shreshth sent several examples from his teacher, of which I’ll show only the first for now (and another later):

I have attached a few more examples given by my teacher:

It is entirely possible that the NCERT books (the books we use in India) did not mean for us to use the substitution method. But teachers here focus more on the easiest ways out with maximum marks, and over the years I have found that for the evaluators the solutions I attached are enough however wrong the solutions are.

So, I will just be using the chain rule from now on.

This particular example has a typo (an extra 2 in the denominator); my impression is that the problem is intended to be what was written below “Putting *x* = tan *θ*“.

Doctor Rick replied:

You did not answer my questions about the situations in which you are taught to use the substitutions, and how you are to decide which substitution to use. Your examples, however, suggest that this technique is taught specifically for functions that are

inverse trig functions of certain classes of function, perhaps with a specific substitution to use for each class.

I myself prefer to minimize how many specific formulas I memorize. It would appear that they have a long list.

As Doctor Fenton suggested, this technique amounts to

simplifying the function before differentiating, so that you only need to differentiate inverse trig functions of x. I would assume that you still sometimes encounter problems in which you can’t simplify that much, so that you need to use the Chain Rule.

This is one of the dangers of focusing on memorized formulas: Not every problem will be amenable to such a formula.

As your initial problem shows, the simplification has some tricky points that are easy to miss. Also, my guess is that you need to remember a number of rules, and they will not always apply. In the long term,

the Chain Rule is one rule that will always work, so it is the most worth knowing. However, I recognize that schools everywhere probably “teach to the test”, and if the test is expected to have a lot of problems like these, I can understand specific methods being taught to solve them faster.

We do seem to see this sort of teaching particularly in South Asia, judging by the questions we get; but it can happen anywhere.

Your first example here, differentiating y = sin

^{-1}(2x/(1+2x^{2}), is incorrect! After the substitution x = tan θ, the function was miscopied, sothe wrong function was differentiated. When I differentiate the original function (using the Chain Rule), the result is not as simple as all your other examples. On the other hand, when I differentiatey = sincarefully, I find that it ought to have a domain restriction like the others (the derivative is different on |x| ≤ 1 than on |x|> 1).^{-1}(2x/(1+x^{2})

So, even correcting the 2 in the denominator, the problem this time is stated incorrectly, by not specifying the domain. The solution implicitly assumes that \(\theta=\tan^{-1}x\) and therefore is in the interval \(-\frac{\pi}{2}<\theta<\frac{\pi}{2}\). But, as in the original problem, in order for it to be true that \(\sin^{-1}(\sin2\theta) = 2\theta\), we need to restrict further to \(-\frac{\pi}{4}<\theta<\frac{\pi}{4}\), corresponding to \(-1<x<1\).

Here is a graph of \(\displaystyle y=\sin^{-1}\left(\frac{2x}{1+x^2}\right)\), again with the given function in green, the claimed equivalent (valid only between -1 and 1), \(y=2\tan^{-1}x\), in red dots, and the equivalent for *x* > 1, \(y=2\cot^{-1}x\), in purple dots:

The correct “simplified” form of the function is $$\left\{\begin{matrix}2\cot^{-1}(x)-2\pi & \text{if }x<-1\\2\tan^{-1}(x) & \text{if }-1\le x\le 1\\2\cot^{-1}(x) & \text{if }x>1\end{matrix}\right.$$ And the derivative is negative outside of the interval \([-1,1]\).

Shreshth included four other examples, but I’ll just look at the last one he sent, which is a little different:

Here you are evidently expected to recognize the triple-angle tangent formula, which I don’t know on sight. An issue similar to the others arises here, but this time the graph is not continuous:

You may notice that the given function is undefined at the endpoints of the given domain, so I would add four open circles if I drew the graph by hand.

The correct “simplification” this time requires shifting the inverse tangent graph up on the left, and down on the right. We could write it as $$\left\{\begin{matrix}3\tan^{-1}(x)+\pi & \text{if }x<-\frac{1}{\sqrt{3}}\\3\tan^{-1}(x) & \text{if }-\frac{1}{\sqrt{3}}<x<\frac{1}{\sqrt{3}}\\3\tan^{-1}(x)-\pi & \text{if }x>\frac{1}{\sqrt{3}}\end{matrix}\right.$$

Again, the answers to all the questions as provided by the teacher are correct where a domain was specified, but the lack of any explanation of the circumstances that make them correct can lead students astray when the problem requires more care. Always be careful when you take an inverse function of a function that is not one-to-one! This applies to the “Grant’s Tomb problem”, and also to the square root of a square or a Pythagorean identity.

]]>Last week I discussed several *Ask Dr. Math* questions about factoring quartic polynomials, which had been on my list of potential topics. That list also included a question on that topic from three years ago, that didn’t make it into the blog at the time. That will lead us into a 133-year-old algebra book, which explores the topic a little more deeply than modern books generally do. As a bonus, we’ll get many exercises for you to try!

The question is from John, on the first day of 2019:

I am very bad at factoring polynomials. For example,

x

^{2}– 2x(a + b) – ab(a – 2)(b + 2) … [A]I tried to find its factors by

+/- (a + b)because a chapter on factoring in my book suggests I do that. Unfortunately this approach doesn’t lead anywhere and I am out of ideas.^{2}Another example is

x

^{3}(a + 1) – xy(x – y)(a – b) + y^{3}(b + 1) … [B]I have no idea what to do here; I tried to open every expression and check if I can apply

groupingsince that’s what my book tells me to do.That’s my biggest problem regarding factoring. I don’t know how to approach problems like that.

My book has a list of strategies which can be used but most of the time none of them work.So my question is how to think about these problems? I know long division and I understand the distributive property. I tried toguess possible factorsthen use long division to check if they give a satisfying answer but that seems very inefficient and time consuming. I tried to multiply different expressions and see what I get, but again this approach involves guessing. So can anybody explain factoring to me?

I’ve added the labels [A] and [B] so we can refer back to these problems later.

The initial attempt at [A] is presumably based on the “wishful thinking” approach we saw last time: We see that the polynomial looks like the beginning of \(x^2-2x(a+b)+(a+b)^2 = (x-(a+b))^2\), so we add and subtract that third term to see what happens, hoping perhaps that we will get a difference of squares. We don’t: $$x^2-2x(a+b)+ab(a-2)(b+2)\\ = x^2-2x(a+b){\color{Red}{+(a+b)^2-(a+b)^2}}+ab(a-2)(b+2)\\ = (x-(a+b))^2-(a+b)^2+ab(a-2)(b+2)$$ and the final terms don’t seem useful. There’s nothing wrong with that; factoring is the sort of endeavor where you may find yourself with the entire contents of your toolbox scattered on the workbench while you try one last thing. Then it may work … or it may not! As we pursue these problems and more, we’ll be buying ourselves a few new tools, and finding new uses for old ones.

I answered:

Hi, John.

I can factor the first by

the usual method for quadratics: Look for a pair of “numbers” whose sum is -2(a + b) and whose product is -ab(a – 2)(b + 2). One of the natural choices works.

I have to admit that, looking at [A] just now, three years later, I didn’t think of that! Just as we would factor \(x^2-7x+12\) by looking for a pair of numbers whose product is 12 and whose sum is -7 (obviously -3 and -4), and concluding that the factors are \((x-3)(x-4)\), we can try various combinations of factors. Taking the product as \(\left[-ab\right]\cdot\left[(a-2)(b+2)\right]\) the sum will be $$\left[-ab\right]+\left[(a-2)(b+2)\right]=2a-2b-4$$ which is wrong.

If we tried \(\left[a(a-2)\right]\cdot\left[-b(b+2)\right]\), resulting in the sum $$\left[a(a-2)\right]+\left[-b(b+2)\right]=a^2-2a-b^2-2b$$ it would have the wrong degree, since we want the sum to be linear in *a* and *b*.

But what about taking the product as \(\left[-a(b+2)\right]\cdot\left[b(a-2)\right]\)? This gives the sum $$\left[-a(b+2)\right]+\left[b(a-2)\right]=-ab-2a+ab-2b=-2(a+b)$$

We’ve got it! The factorization is $$x^2-2x(a+b)+ab(a-2)(b+2)=(x-a(b+2))(x+b(a-2))\\ =(x-2a-ab)(x-2b+ab)$$

At this point, however, I had no idea how to handle [B].

I would like to see the actual

list of strategiesyou were taught (word for word), and exactly what they say ashintsfor these two problems. Presumably you are learningadvanced techniques beyond the usual, at least in part, since we don’t usually have so many variables in a factoring problem.What course is this, and what is the textbook?

John replied,

Hi Mr Peterson thank you for your response I am reading a book called

. I am taking pre calculus college course. I wanted to check some more challenging problems since the ones provided in class were too easy. My instructor is reluctant to advise on topics not related to the coursework provided by him because of the time constraints so I decided to ask here.A Treatise on Algebraby Charles SmithFactoring techniques presented by Smith are these:

- Factor out
monomials.- Check for expressions like a
^{n}– b^{n}and other knownidentities.- x
^{2}+ (a + b)x + ab = (x + a)(x + b) (now I see that it can be applied in the first problem above).Solvethe quadratic then use its roots.Complete the square.Rearrangeandgroupthe terms.There are examples provided in the book but they are very simple like -ax

^{3}– x^{2}+ ax + 1 … [C]; I have no problems solving these. However I seem to get easily confused when the number of variables tends to increase like in the problems like this onex

^{4}– 2x^{2}a^{2}– 2x^{2}b^{2}+ a^{4}+ b^{4}– 2a^{2}b^{2}… [D]Problems like multiplication and division have clear algorithms which I can follow;

is there an algorithm for factoring?

The name of the book sounds old-fashioned, so John appears to have found a book from another generation, which may well cover material we no longer teach! And it is understandable that a teacher would consider this outside of his duties; this is *our* job!

Problem [C] is indeed a typical example of factoring by grouping: $$-ax^3-x^2+ax+1 = -x^2(ax+1)+1(ax+1)\\ =(1-x^2)(ax+1) =(1+x)(1-x)(ax+1)$$ I would usually factor out -1 first to avoid negative leading coefficients: $$-(ax^3+x^2-ax-1) = -(x^2(ax+1)-1(ax+1)\\ = (x^2-1)(ax+1) = (x+1)(x-1)(ax+1)$$ The important thing is that this only works for special cubics, and can’t be expected to work in general. That’s how factoring works.

Let’s try [D]. If we rewrite it as a polynomial in *x*, we get $$x^4-2(a^2+b^2)x^2+(a^4-2a^2b^2+b^4)$$ We can try to factor it, again as an ordinary quadratic trinomial: $$x^4-2(a^2+b^2)x^2+(a^4-2a^2b^2+b^4) = x^4-2(a^2+b^2)x^2+(a^2-b^2)^2$$ This is so close to the perfect square $$x^4-2(a^2+b^2)x^2+(a^2+b^2)^2 = (x^2-(a^2+b^2))^2$$ that I wonder if it could be a typo … but what if we try that “wishful thinking” idea again! Let’s add and subtract \(4a^2b^2\) to “make it so”:

$$x^4-2(a^2+b^2)x^2+(a^4-2a^2b^2+b^4)+4a^2b^2-4a^2b^2\\ = [x^4-2(a^2+b^2)x^2+(a^4+2a^2b^2+b^4)]-4a^2b^2\\ = (x^2-(a^2+b^2))^2-(2ab)^2\\ = [(x^2-(a^2+b^2))-(2ab)][(x^2-(a^2+b^2))+(2ab)]\\ = (x^2-(a^2+2ab+b^2))(x^2-(a^2-2ab+b^2))\\ = (x^2-(a+b)^2)(x^2-(a-b)^2)\\ = (x-(a+b))(x+(a+b))(x-(a-b))(x+(a-b))\\ = (x-a-b)(x+a+b)(x-a+b)(x+a-b)$$

We did it! And we just had to use several “basic” methods (notably the difference of squares) over and over, never giving up.

But I hadn’t solved this yet when I responded.

I was intrigued by the book, which on one hand sounds unexpectedly modern in its list of methods, but not in its selection of problems:

Thanks.

The techniques you listed are only

the usual methods, not anything advanced – the list is entirely ordinary.These examples, on the other hand, are

extraordinary! I rarely, if ever, see factoring problems with parameters like a and b. As you found, the first yields to an ordinary method; I’m not convinced that the second does.

There is no mere algorithm for factoring; it is very muchan art that requires skill. See the following pages for examples of more advanced techniques, as well as comments about the difficulty of factoring in general:

Factoring a Multivariate Polynomial: Strategies Beyond Grouping

Advanced Polynomial Factoring Methods

Factoring Quartic Expressions with No Real ZerosDon’t try to read through all of those; I don’t know everything mentioned there myself! But notice that even these don’t include parameters.

Those links are to pages I looked at last week but considered too advanced to include in the blog; maybe we’ll look at them some other time. (To read them where they are, you’ll need to set up a free account.)

I used an idea from the first of those pages to work out [B]:

I found that I can use one of the methods mentioned to work out your second example. What I did was to observe that the polynomial is

homogeneous in x and y(that is, the total degree of each term is the same, thinking of a and b as mere numbers), so that, being 3rd degree, it should factor as something like(mx

^{2}+ nxy + py^{2})(qx + ry)where

each factor is homogeneous.I expanded this, then matched up coefficients with the coefficients of your polynomial (after expanding it in x and y, keeping a and b in their parentheses); for instance, mn = (a + 1). Then, thinking of each factor like (a + 1) or (a – b) as a prime number, I looked for a solution to the system of four equations in m, n, p, q, r. This involved some trial and error, but not too much. Try that.

This technique of choosing a likely form for the factors and matching coefficients is one we saw last week. Let’s carry it out: $$(mx^2+nxy+py^2)(qx+ry) = mqx^2+(mr+nq)x^2y+(nr+pq)xy^2+pry^3$$ and $$x^3(a+1)-xy(x-y)(a-b)+y^3(b+1)=(a+1)x^3+(b-a)x^2y+(a-b)xy^2+(b+1)y^3$$

so equating coefficients gives the system $$\left\{\begin{matrix}mq=a+1\\ mr+nq=b-a\\ nr+pq=a-b\\ pr=b+1\end{matrix}\right.$$

The first equation implies that either \(m=1\) and \(q=a+1\), or \(m=a+1\) and \(q=1\); likewise, the last equation implies that either \(p=1\) and \(r=b+1\), or \(p=b+1\) and \(r=1\). This gives four possible combinations we can check out:

If \(m=1,\ q=a+1,\ p=1,\ r=b+1\),

then \((b+1)+n(a+1)=b-a\) and \(n(b+1)+(a+1)=a-b\),

both of which imply \(n=-1\).

If \(m=1,\ q=a+1,\ p=b+1,\ r=1\),

then \(1+n(a+1)=b-a\) and \(n+(a+1)(b+1)=a-b\),

so that \(\displaystyle n=-ab-2b-1\) and \(\displaystyle n=-\frac{a-b}{(a+1)(b+1)}\) which is inconsistent.

If \(m=a+1,\ q=1,\ p=1,\ r=b+1\),

then \((a+1)(b+1)+n=b-a\) and \(n(b+1)+1=a-b\),

so that \(\displaystyle n=-ab-2a-1\) and \(\displaystyle n=\frac{a-b-1}{b+1}\) which is inconsistent.

If \(m=a+1,\ q=1,\ p=b+1,\ r=1\),

then \((a+1)+n=b-a\) and \(n+(b+1)=a-b\),

so that \(n=b-2a-1\) and \(n=a-2b-1\) which is inconsistent.

(Once I found the first, I didn’t really need to try the others, since all I need is one set of coefficients.)

So our coefficients are $$m=1\\ n=-1\\ p=1\\ q=a+1\\ r=b+1$$ and we have the factorization $$x^3(a+1)-xy(x-y)(a-b)+y^3(b+1)=(x^2-xy+y^2)((a+1)x+(b+1)y)$$ which checks out when multiplied.

But we have to wonder whether the original partially factored form could have been of use. We’ll see another way shortly.

I don’t think you need to be anxious about being able to do these problems; but if you can, you will be far ahead of your class!

Now, having written all that,

I looked for your book online, recognizing that the title looked old. Old books often are more challenging than modern ones.Are you referring to this book, from 1888?

A Treatise on Algebra, By Charles Smith

I see your problems on pages 64-65. I haven’t looked carefully through its examples to see if there is a good parallel to this one; I’ll have to spend some time looking.

Here are the problems, which you may find interesting; our [A], [B], and [D] are #20, 32, and 36:

Here is part of what is taught before these problems, including our [C] as the first example:

When I teach the basic form of “factoring by grouping”, I generally recommend ordering terms in descending order by the powers of *x*; but examples 2 and 3 here are given to us already written that way, and choosing to order instead by the degree of a parameter instead hadn’t occurred to me.

After reading that, I had more to say:

Hi again!

I just read page 61 (section 85, about grouping), and decided to follow the suggestion of focusing on

a,which occurs only in the first power. It worked like magic.Collect the terms containing

a; then notice that many of the remaining terms containb, so collect those; and then collect the other two terms.Now factor the GCF from each of these three groups of terms, and everything will fall into place (if you also notice a sum of cubes).

In other words, the technique being taught here is to

look for the easiest thing to do, and hope that will work!

Here’s the work on [B] for this method:

$$x^3(a+1)-xy(x-y)(a-b)+y^3(b+1)\\ =a(x^3-xy(x-y))+b(xy(x-y)+y^3)+x^3+y^3\\ = ax(x^2-xy+y^2)+by(x^2-xy+y^2)+(x+y)(x^2-xy+y^2)\\ = (ax+by+x+y)(x^2+xy+y^2)\\ = ((a+1)x+(b+1)y)(x^2+xy+y^2)$$ as before, with considerably less work.

John wrote back:

Hi thanks for all the help. Thanks to you I managed to solve most the exercises. It seems so simple now almost embarrassing I did not notice those tricks immediately.

Nor did I.

I responded:

Most tricks seem simple once you see them.

This was fun.

As I often tell students, my definition of “fun” is “challenging”.

Then he had another question:

Hi can you help me with one more factoring problem?

(y – z)

^{5}+ (z – x)^{5 }+ (x – y)^{5}… [E]I know that it has three factors (x-y)(z-x)(y-z) because if I replace x by y or z by x or y by z I get zero. However the degree is 5 so there has to be either two more factors or one factor of the second degree. I know this because there’s an example of a 5th degree polynomial in the book we talked about above and the author demonstrates how to factor it. Here is the example:

b

^{2}c^{2}(b – c) + c^{2}a^{2}(c – a) + a^{2}b^{2}(a – b) … [F]It factors into

-(b – c)(c – a)(a – b)(bc + ca + ab)

To begin with

how does he know that the fourth factor will be of second degreeand not two factors of the 1st degree? Secondly,how did he come up with this expression; was it just a guess? I know that the polynomials in the example and in my problem aresymmetricand that symmetric polynomials are constructed from the simpler symmetric polynomials, so is there like alist of simple polynomialsyou can look at to help you guess the factor?

In answer to his last question here, which I never explicitly answered, what he wants are called elementary symmetric polynomials, which you can also read about here.

I responded:

I see the problem on page 73, and read the previous pages. The techniques there are similar to some in the pages on our site that I referred to earlier. It is interesting reading!

I imagine that if there were

two linear factors, you could still find them by assuming asecond-degree factorand discovering that it could be factored further, soit is not necessary to know which is the case.

Here are the problems on page 73, where [E] is exercise 2:

Here is part of what is taught leading up to these, on page 71:

And here is the work for example [F]:

In both of these examples, there is an element of guessing: He tries setting two variables equal to see if \(b-c\) might happen to be a factor. So in these problems, which are contrived to use the ideas he is teaching, we can expect such things to work!

I continued, correcting a slightly wrong implication there:

He explains the reasoning for this guess, namely that the factors can be

expectedto have the same symmetry as the original (though there isno guarantee, as explained in the last response inFactoring a Multivariate Polynomial: Strategies Beyond Grouping). Ultimately, yes, it is ahopeful guess, an “ad-hoc technique” in Dr. Vogler’s terms). So he assumes the squared terms all have the same coefficient, as do all the cross-terms like xy. That is required forsymmetry.

Smith said that the new factor must be symmetrical; this is true **because the other factors (taken together) are**. But Doctor Vogler pointed out in that page that, in general, a symmetrical polynomial may have factors that are **not**, giving an example:

Not all symmetric polynomials factor into symmetric factors. For example, (x + y)(x + z)(y + z) is symmetric in three variables, but none of the factors is. However, a symmetric polynomial thatfactors into terms of different degreeswill have symmetric factors. The reason is that permuting the variables doesn't have to leave all of the factors unchanged, but it does have to leave the collection of factors unchanged, which means that the most it can do is rearrange the factors. (Look at how swapping x and y, for example, will rearrange the factors in my example polynomial.)

Our quadratic factor must be symmetric, which means the coefficients of any terms that would be exchanged by permutation of the variables must be the same:

For the problem you are asking about, you can just follow the example, guessing that the factorization has the form

(y-z)(z-x)(x-y)(L(x

^{2}+ y^{2}+ z^{2}) + M(yz + zx + xy))The fifth degree terms must be 0, but you can look at the fourth degree terms to find the coefficients.

So let’s close by finishing the work for [E].

We assume that $$(y-z)^5+(z-x)^5+(x-y)^5\\ = (y-z)(z-x)(x-y)(Lx^2+Ly^2+Lz^2+Myz+Mzx+Mxy)$$

The product of three linear factors expands to $$xy^2+yz^2+x^2z-x^2y-y^2z-xz^2$$

Multiplying this by the quadratic factor, we get $$Lx^3y^2+Lx^2yz^2+Lx^4z-Lx^4y-Lx^2y^2z-Lx^3z^2\\ +Lxy^4+Ly^3z^2+Lx^2y^2z-Lx^2y^3-Ly^4z-Lxy^2z^2\\ +Lxy^2z^2+Lyz^4+Lx^2z^3-Lx^2yz^2-Ly^2z^3-Lxz^4$$ $$+Mxy^3z+My^2z^3+Mx^2yz^2-Mx^2y^2z-My^3z^2-Mxyz^3\\ +Mx^2y^2z+Mxyz^3+Mx^3z^2-Mx^3yz-Mxy^2z^2-Mx^2z^3\\ +Mx^2y^3+Mxy^2z^2+Mx^3yz-Mx^3y^2-Mxy^3z-Mx^2yz^2$$

This simplifies to $$-Lx^4y+Lx^4z+(L-M)x^3y^2+(M-L)x^3z^2\\ +(M-L)x^2y^3+(L-M)x^2z^3+Lxy^4-Lxz^4\\ -Ly^4z+(L-M)y^3z^2+(M-L)y^2z^3+Lyz^4$$

We could now expand the LHS, and equate coefficients. But we don’t really need to do all that to solve for two constants, do we? We only need two equations. As I suggested, we could just pick a couple terms in the expansion.

The term in the LHS with degree 5 in *x* will be \(-x^5+x^5 = 0\), which is good since the RHS can have no such term.

The terms in the LHS with degree 4 in *x* are \(5x^4z-5x^4y\); in the RHS we get \(-x^2(y-z)Lx^2=Lx^4z-Lx^4y\), so we know that \(L=5\).

What of the terms with degree 0 in *x*? For the LHS, we have $$(y-z)^5+z^5-y^5=-5y^4z+10y^3z^2-10y^2z^3+5yz^4+z^5$$ and for the RHS we have $$-yz(y-z)(Ly^2+Lz^2+Myz)=-Ly^4z-(M-L)y^3z^2+(M-L)y^2z^3+Lyz^4$$ Equating coefficients, we see again that \(L=5\), and that \(M-5=-10\) so that \(M = -5\).

So our factorization is $$(y-z)^5+(z-x)^5+(x-y)^5\\ = (y-z)(z-x)(x-y)(5x^2+5y^2+5z^2-5yz-5zx-5xy)\\ = 5(y-z)(z-x)(x-y)(x^2+y^2+z^2-yz-zx-xy)$$

Can the quadratic factor be factored further? I suspect not, for several reasons. But let me know if you find otherwise!

]]>

We’ll start with this, from Adam in 1998:

Nonlinear Factors I need to factorx^4 + 4. I have been told thatfactoring the sum of two "squared" numbersis not possible; however, my instructor indicates this one can be done. I have had no luck. Is he pulling my leg?

We can factor a “difference of squares” like \((x^2)^2 – 2^2\), but this is a *sum* of squares, and that can’t be factored … unless you are allowed to use complex numbers. We’ll get back to that idea!

Doctor Pete answered, starting with what sort of factors we can expect:

Now, usually, when we think of factoring a polynomial, we are thinking of finding a form: (x - a)(x - b)(x - c)... where a, b, c, ... are constants. Butfactoringis closely connected with a similar problem, which is findingrootsof a function. In particular, if you have a polynomial function f(x), and solve the equation: f(x) = 0 then what you are finding are the values of a, b, c,... in the factored form. For example, say: f(x) = x^2 - 4 Then, observe that f(2) = f(-2) = 0; that is, x = 2, and x = -2 are roots of f(x). Then f(x) has the factored form: f(x) = x^2 - 4 = (x - 2)(x + 2)

So **linear** factors correspond to **real** roots (also called zeros) of the polynomial, and in particular to **rational** roots, when we are factoring “over the integers”, meaning we allow only integer coefficients. What if there are none?

However, in the case where: f(x) = x^4 + 4 we see thatthere is no real value of xfor which f(x) = 0, because x^4 is always nonnegative. So one cannot expect a factorization of the form: (x - a)(x - b)(x - c)(x - d) where a, b, c, d are all real numbers. However, it may be possible to factor it as: (x^2 + px + q)(x^2 + rx + s) that is, as a pair ofirreducible quadratics. Why this may be possible will be clearer in a moment.

What can happen here is that the roots *a*, *b*, *c*, and *d* may in fact be **complex** numbers, which will come in conjugate pairs (assuming that the coefficients of the polynomial are integers, or more generally rational numbers), so that products of pairs of factors will form irreducible (unfactorable) quadratic factors. The same is true if some of *a*, *b*, *c*, and *d* are real but **not rational**.

For now, suppose that: x^4 + 4 = (x^2 + px + q)(x^2 + rx + s) for some unknown constants p, q, r, s, which are all real numbers. By expanding the righthand side, we see that: x^4 + 4 = x^4 + (p+r)x^3 + (q+s+pr)x^2 + (ps+qr)x + qs If we equate the coefficients on both sides of this equation (why?), then we find that: p + r = 0 q + s + pr = 0 ps + qr = 0 qs = 4

We equate coefficients because we want the equation to be true *for all x* (“identically equal”) so that the two sides are really the *same polynomial*.

Observe that we have four equations in four unknowns. That gives us hope that we can find a solution, though as a system of nonlinear equations, it is a little harder than you may be used to.

From the first equation p = -r, and substituting this into the third equation gives r(-s + q) = 0. This meansr = 0, and/or q = s. Suppose r = 0. Then p = 0, and we have from the second equation that q + s = 0. But this isimpossible, since qs = 4. So we must have instead thatq = s, and hence, from the fourth equation,q = s = 2 or q = s = -2. Suppose q = s = -2. Then from the second equation, we have that -2 - 2 + pr = 0, which implies pr = 4. Since r = -p, we find that -p^2 = 4, which isimpossible. Now suppose, p = r = 2. From the second equation, we then have 2 + 2 + pr = 0, or pr = -4. Since p = -r, it follows that r^2 = 4. Therefore,r = 2 or -2, and p = -2 or 2. Note that this makes sense, since this produces the factorization: x^4 + 4 =(x^2 + 2x + 2)(x^2 - 2x + 2)and clearly p and r are interchangeable, as long as they have opposite signs. One can check that the factorization is correct by multiplying out the righthand side.

Now, to gain an understanding of why the polynomial x^4 + 4 factors intotwo quadraticsbut notfour linear factors, we attempt to find the roots of, say, x^2 + 2x + 2, which is one of the quadratics. Clearly, any root of this polynomial is also a root of x^4 + 4. Using the quadratic formula, we find that the roots are given by: x = (-2 + Sqrt[-4])/2 , (-2 - Sqrt[-4])/2 or: x = -1 + i , -1 - i where i is the square root of -1. These numbers arecomplex numbers, which explains why a linear factorization can't be found.

Alternatively, we could go ahead and treat the polynomial as a **difference of squares**: $$x^4 + 4 = x^4 – (-4) = (x^2)^2 – (2i)^2 = (x^2 – 2i)(x^2 + 2i)$$ so that $$x = \pm\sqrt{2i}\text{ or } x = \pm\sqrt{-2i}$$

There are several ways to find the square root of a complex number; the easiest typically is to use polar (or exponential) form, but we can also suppose that $$(x + yi)^2 = 2i\\ (x^2 – y^2) + 2xyi = 2i\\ x^2 – y^2 = 0, 2xy = 2\\ x=y=\pm1$$ and similarly for the other case. So our full linear factorization (over the complex numbers) is $$(x – (-1 + i))(x – (-1 – i))(x – (1 + i))(x – (1 – i))$$

The product of the first two (complex conjugate) factors is $$((x+1)-i))((x+1)+i)) = (x+1)^2-i^2 = x^2 + 2x + 1 + 1 = x^2 + 2x + 2$$ We’re just following Doctor Pete’s process in reverse.

The next question is from 1996 (anonymous):

Factoring Quartics Can you help me factorize f(x) =x^4 - 6x^3 + 11x^2 - 6x + 1and solve f(x) = 0?

Doctor Liu answered:

Solving polynomial equations of degree higher than 2 (quadratic) generally involves someguesswork. For example, you first try theRATIONAL ROOT TESTto see if it has any rational roots. If none, then the problem is certainly more difficult, and with luck, (orif the problem is designed to be solvable), you factor it into a product of QUADRATIC polynomials. RATIONAL ROOT TEST The only possible rational roots of a polynomial (set equal to zero) are rational numbers whose: (i) numerators are divisors of the CONSTANT term (ii) denominators are divisors of the COEFFICIENT of the HIGHEST degree term. For the present case, since the constant term and the leading coefficient are both 1,we need only test the rational numbers 1 and -1. Direct calculation shows that neither of these satisfies the equation, so the equation hasNO rational root.

This is the first method I would usually try. When the coefficients have few factors, it is commonly the fastest way. Here, since the only candidates are 1 and -1, we just use synthetic division with these two numbers (equivalent to evaluating the polynomial for those values), and we have eliminated this possibility.

There is a “formula” for quartic equations (as also for cubic equations, but not for any higher degree), but I don’t think I’ve ever even tried to use it!.

As we saw above, when there are no rational roots, there are no linear factors (over the integers), which leaves only one possibility for the factors:

The only way to solve the equation is FACTORIZATION into a product of two QUADRATIC factors. Generally, we would try: (x^2 + ax + b)(x^2 + cx + d) However, in the present case, observe that the polynomial is SYMMETRIC. So, we try to see if it is possible to arrange these quadratic factors to be SYMMETRIC as well. In other words, we try to factor it in the form: (x^2 + ax + 1)(x^2 + cx + 1)

The symmetry referred to is that the coefficients are 1, -6, 11, -6, 1, reading the same in both directions. But we could also come to the same conclusion merely because the constant term is 1, so the constant terms of both factors must be 1.

Expanding this product, we have: (x^2 + ax + 1)(x^2 + cx + 1) = x^4 + ax^3 + x^2 + cx^3 + acx^2 + cx + x^2 + ax + 1 = x^4 + (a+c)x^3 + (ac + 2)x^2 + (a+c)x + 1. Comparing with the given polynomial, we would like to have: a + c = -6 ac + 2 = 11 ---> ac = 9 The only possibility is a = c = -3. If you can see this immediately, that is wonderful, and you should proceed directly to the next paragraph.

As before, we equate corresponding coefficients because they must be the same polynomial, term by term.

If not, you find these numbers a and c by first eliminating one of them, and see that you run into a QUADRATIC equation: -6 - a = c a(-6 - a) = 9 -6a - a^2 = 9 a^2 + 6a + 9 = 0 (a+3)^2 = 0 a = -3 From this, c = -3 as well.

The quick way is to think as we do when factoring a quadratic, and just list possible products giving 9, looking for a pair of factors whose sum is -6. They are obviously -3 and -3.

Since *a* and *c* are the same, the two quadratic factors are the same.

This means that the given polynomial is indeed a SQUARE: x^4 - 6x^3 + 11x^2 - 6x + 1 =(x^2 - 3x + 1)^2Its roots are therefore those of x^2 - 3x + 1 = 0 Now, by the quadratic formula, we get: x = (3 +/- Sqrt 5)/2. Each of these is counted twice as a root of the 4th degree equation.

Here is the graph of the polynomial, showing double zeros at \(\frac{3+\sqrt{5}}{2} \approx 2.618\) and \(\frac{3-\sqrt{5}}{2} \approx 0.382\):

Our third question is from John in 2004:

Solving a Quartic Equation with Substitutions I'm trying to solvey(y + 1)(y + 2)(y + 3) = 7920, which is a problem from my friend's kid. First I multiplied it all out: (y^2 + y)(y + 2)(y + 3) = 7920 (y^3 + 3y^2 + 2y)(y + 3) = 7920 y^4 + 6y^3 + 11y^2 + 6y - 7920 = 0 I am not able to solve it by factoring. Am I on the right track? I haven't done any maths for more than 10 years, and I think I forgot almost everything I learned! Can you please give me a hint?

Doctor Douglas answered:

Hi John. Your work so far is fine, and you are trying to factor this last equation: y^4 + 6y^3 + 11y^2 + 6y - 7920 = 0. You can factor the equation in a number of ways: 1. You can divide through by guesses such as (y-3), (y+6), (y-8), ... and see if any of these happen to work. Because this was given as a problem for a schoolkid, I'm guessing that the roots are probably integers, so this not an unreasonable approach.

This could be done by starting with the rational root theorem; or it could be done by simply guessing at a solution to the original equation. Since \(y(y+1)(y+2)(y+3) = 7920\) is not far from \(y^4 = 7920\), we might just try values of \(y\) near \(\sqrt[4]{7920}\approx 9.43\); trying \(y=8\) we find that \((8)(9)(10)(11) = 7920\), where the average of the four factors is 9.5. Or, one could just factor the number 7920 and try to arrange the prime factors as a product of consecutive integers. But one would still wonder (if one had the mathematician gene) whether this is the only (real) solution! We’ll see …

2. Another way to do this is to realize that the four roots are equally-spaced, because the factors come in a nice arithmetic progression: y, y+1, y+2, y+3. So let's average those anddefine u = y + 3/2as the center of this set of four numbers, and the equation becomes (u - 3/2)(u - 1/2)(u + 1/2)(u + 3/2) - 7920 = 0 This is nice, because the factors multiply out such that the cross terms cancel: [(u - 3/2)(u + 3/2)][(u - 1/2)(u + 1/2)] - 7920 = 0 (u^2 - 9/4) (u^2 - 1/4) - 7920 = 0

There is a lot of insight hidden here! (If you’re curious, we’ll be coming back to how he chose that transformation.) He paired up the factors to make two differences of squares, which is a nice way to save work.

And if we make one more substitution, usingv = u^2, this IS a quadratic equation in terms of v: (v - 9/4)(v - 1/4) - 7920 = 0 v^2 - 10v/4 + 9/16 - 7920 = 0 16v^2 - 40v + 9 - 16*7920 = 0 16v^2 - 40v - 126711 = 0 Now you can factor this quadratic trinomial using many methods, or you can use the quadratic formula with a = 16, b = -40 and c = 126711. You will find that this quadratic equation factors as follows: 16*(v - 90.25)(v + 87.75) = 0 and has roots of v = 90.25 or -87.75

If we want to avoid decimals (as I typically do), we can split the 16 between the other factors, obtaining the equation $$(4v-361)(4v+351)=0$$ The roots are \(\frac{361}{4}\) and \(-\frac{351}{4}\).

Since v = u^2, the second root leads to no real solution for u, and we must have v = 90.25 u^2 = 90.25 u = sqrt(90.25) u = 9.5 or -9.5 which means that going back to (u - 3/2)(u - 1/2)(u + 1/2)(u + 3/2) our set of y's is either {8, 9, 10, 11} or {-11, -10, -9, -8}. This is a tough problem because of the substitution steps, so don't feel bad about not being able to do it!

So there are, in fact, two real solutions, not just the one positive solution we could find easily!

Reversing all the substitutions, our equation factors as $$(4v – 361)(4v + 351) = 0\\ (4u^2 – 361)(4u^2 + 351) = 0\\ (2u – \sqrt{361})(2u + \sqrt{361})(4u^2 + 351) = 0\\ \left(2\left(y+\frac{3}{2}\right) – 19\right)\left(2\left(y+\frac{3}{2}\right) + 19\right)\left(4\left(y+\frac{3}{2}\right)^2 + 351\right) = 0\\ (2y-16)(2y+22)\left(4\left(y^2+3y+\frac{9}{4}\right) + 351\right) = 0\\ 4(y-8)(y+11)(4y^2+12y+9+351) = 0\\ 4(y-8)(y+11)(4y^2+12y+360) = 0\\ 16(y-8)(y+11)(y^2+3y+90) = 0$$

John replied,

Thanks for the prompt reply. I followed most of your work, but I'm a little confused by the step where you chose u = y + 3/2.Why do I need to define "u" as the centerof the set of numbers, not the beginning or end of the numbers? Is this a maths theory?

Doctor Douglas responded:

That's a very good question! Mostly this was simply aninspired guess, guided by our desire to take advantage of theleft-right symmetry of the roots. By doing so, we separate the odd (y^3 and y^1) terms from the even (y^4, y^2, y^0) terms, and the latter set is what leads to our quadratic trinomial via the substitution v = y^2. Note that this trick worked only because of the nice progression of factors {y, y+1, y+2, y+3}. It would have been much tougher to work with the set {y, y+1, y+2, y+4}.

John closed:

Thanks Dr. Math for helping me. Your work was interesting and your comments helpful. I appreciate it!

I used the same technique in Fun with a Quartic Equation, which has links to some of these answers, and more.

Here is a graph of the function on the left of our equation in its original form:

We can see here that the function is symmetrical about the line \(x=-\frac{3}{2}\); the substitution shifted the graph to the right 1.5 units, so that it would be symmetrical about the *y*-axis, making it an even function and therefore easy to solve.

In order to see the solutions (where this function crosses the line \(y=7920\)), we need to zoom out considerably:

Finally, we have a question from Zubin in 2005:

Factoring x^4 + (x^2)(y^2) + y^4 Is there any mathematical way or pattern to factor the polynomialx^4 + (x^2)(y^2) + y^4, since it does factor into the polynomials x^2 + xy + y^2 and x^2 - xy + y^2, which are then not factorable? At first it seemed simply a square of two sums, but unfortunatelythe middle term is not doubled. The only way I could factor this was by guessing and checking. I find it very difficult to put this polynomial into any category. It does not factor by grouping, synthetic factorization, division, etc. I have now been led to believe that I have not yet gotten to the point in my education where I can factor this. This was a question that arose when my pre-calculus teacher had to review factoring.

Doctor Schwa answered:

Hi Zubin - What you need is the problem-solving strategy called "wishful thinking". I WISH that the question were x^4 +2 x^2 y^2+ y^4, because then I could factor it into (x^2 + y^2)^2. How can I make it so? Well, I can't just change the question, so if I add an x^2 y^2 to it to fit my wish,I must also subtractan x^2 y^2. Does that hint help you see how to find the factors?

Here is what happens: $$x^4 + x^2y^2 + y^4 = (x^4 + 2x^2y^2 + y^4)-x^2y^2\\ = (x^2+y^2)^2-(xy)^2 = (x^2+y^2+xy)(x^2+y^2-xy)$$ And that’s what Zubin had found.

The next way might be described as an instance of recognition of a familiar pattern:

There's another method, too, which is to know how tofactor a difference of cubes: (a^3 - b^3) = (a - b)(a^2 + ab + b^2). Now, if we divide both sides by (a - b), you can see that what you havematches the pattern of the right side: (a^3 - b^3) ----------- = (a^2 + ab + b^2) (a - b) Setting a = x^2 and b = y^2, we have: (x^6 - y^6) ----------- = x^4 + x^2 y^2 + y^4 (x^2 - y^2) Now, to work from there, you can factor x^6 - y^6 as a difference of SQUARES instead of a difference of cubes. Do you see where that leads? Feel free to write back and let me know how it goes, or if you'd like more hints along either of those two paths to the solution.

So far, this doesn’t look promising, because we want a product of polynomials, not a quotient. But let’s try it: $$x^4+x^2y^2+y^4 = \frac{x^6-y^6}{x^2-y^2} = \frac{(x^3-y^3)(x^3+y^3)}{(x-y)(x+y)}\\ = \frac{x^3-y^3}{x-y} \frac{x^3+y^3}{x+y} = (x^2+xy+y^2)(x^2-xy+y^2)$$ where we used the difference of cubes again at the end.

That’s a cute trick, not generally applicable, and again discovered largely by seeing things you might do, and doing them, hoping it will help.

]]>

Here is the question, from Carson at the end of October:

Say one digs a tunnel through the Earth. The angle of descent determines the egress point and depth of the tunnel.

Starting from the surface, how would one determine

the angle to digif they wanted to enter at 100° and exit at 80°? or 110° to 70°?How would one determine

the depth of the deepest point[the 90-degree point]?Another way to question this would be, “

If one were to dig a tunnel at 10° into the Earth, starting at the 100° point, and digging a straight line, how would they determinethe degree point where exiting? How would they determinethe length of the tunnel? How would they determine the tunnel’sgreatest depth from the surface?”Thank you for your help.

Carson

The question was actually spread out over three initial messages, and the question became clearer as they came in! There are really several questions here: Given the destination, what direction will you be digging, and how deep will you go? Given the direction, what will be the destination, distance, and depth?

I answered:

Hi, Carson.

I’m glad you included the picture, because I was very confused at first — I thought the angles you listed were angles of inclination (“enter at 100°” as a direction), which made no sense. But they are apparently

locations, something likelongitudes(if you were on the equator).My first thought is that all that matters is the

differencein the locations, which is the measure of the arc you are subtending. For example, digging from 110° to 70°, you subtend a 40° arc.

The specified longitudes **on the equator** are not on land (the closest being a tunnel from Thailand to Sri Lanka under the Indian Ocean); **on another parallel of latitude**, distances corresponding to a given angle would be smaller, and longitudes would not correspond to angles along a great circle route. We could also use locations **on the same longitude line** , but latitudes are not measured greater than 90 degrees, so that doesn’t fit his example.

But a 40 degree arc amounts to **any** distance of about 2765 miles on the surface, and 20 degrees to about 1382 miles. The former might be from Bellingham, Washington, to Miami:

… and the latter from Cody, Wyoming, to Meridian, Mississippi (the middle half of that route):

The route is a Great Circle, which appears curved on a map; thanks to Google Maps for the nice tool!

We wouldn’t normally talk about this as covering 40 degrees of the earth, or identify the start and end points by angles like 110°; but I think this is the best interpretation we can make for the question. Picture slicing the earth with a plane through the center and both end points, and then marking off degrees on the resulting circle, and it will fit what he asked.

The

lengthof the tunnel is a chord (c), and thegreatest depthis the height (sagitta, h), of the arc, as shown in Wikipedia:The

angleat which you start digging would be θ/2, where θ is the measure of the arc (e.g. in my example above, θ = 40°, so you would start digging at 20° to the surface).The formulas for c and h are given in the Wikipedia link, in terms of known values of R and θ.

The formulas as shown in Wikipedia are as follows:

The chord length is $$c=2R\sin{\frac{\theta}{2}}=R{\sqrt{2(1-\cos\theta)}}$$

The sagitta is $$h=R\left(1-\cos{\frac{\theta}{2}}\right)=R\left(1-{\sqrt{\frac{1+\cos\theta}{2}}}\right)$$

Let’s make our own picture, so I can better explain these answers, and the formulas:

Incidentally, this picture shows the reason for some of the names of its parts. The word “arc” (*s*) originally meant “bow” (as in *arch*ery); “chord” (*c*) meant “cord” or “string”; and “sagitta” (*h*) meant “arrow” (as in the constellation *Sagitta*rius, the archer). Can you see the latter nicely fitting to the string before being pulled back?

From right triangle ADC, we see that $$\sin{\frac{\theta}{2}} = \frac{c/2}{R} = \frac{c}{2R},$$ and solving this for \(c\) gives the first formula above.

Also from ADC, $$\cos{\frac{\theta}{2}}= \frac{R-h}{R} = 1 – \frac{h}{R},$$ which we solve for \(h\) to get the second formula.

The alternative forms come from the half-angle formulas for sine and cosine.

Other, related formulas (but involving only distances, not angles) are discussed in the article How Much Does the Earth Curve?

And why is the angle at which we dig \(\frac{\theta}{2}\), as I said, and as shown in the diagram above? One way to see this is that \(\angle CAD\) is the complement of \(\frac{\theta}{2}\), and the angle between the chord and the tangent is the complement of that. The complement of the complement is the original angle.

This leads to the answer to his second form of the question:

If you start digging at 10° to the surface, then you know you will come out 20° from the start (that is, 20/360 = 1/18 of the circumference of the earth). Of course, this depends on being able to dig in an exact line maintaining that direction.

You’d probably need to use some sort of laser beam to keep the tunnel straight; the slope, as measured by a plumb line or level, would be constantly changing as you reach “bottom” and start climbing “uphill” while still going straight!

Interestingly, this tells us that the distance (as measured along the ground) from start to exit is proportional to the angle at which you dig — all the way until you dig vertically, 90 degrees from horizontal, and come out 180 degrees from the start. The linear distance, of course, is not at all proportional.

Incidentally, the fact that the angle is half the angle subtended is an instance of the third theorem here:

This site lists several “circle theorems” relating angles between chords, tangents, and secants of a circle to the central angles of the arcs intercepted. Our case is essentially identical to the “tangent-chord angle theorem”.

Let’s look at our two examples. The radius of the earth is \(R\approx 3,960\) miles (depending on how you measure it; see here for details). Using that, and converting angles to radians as needed:

From **Bellingham to Miami**, with \(\theta=40^\circ\), the surface distance is \(\frac{40^\circ}{180^\circ}\pi\cdot 3960\approx 2765\text{ miles}\approx 4450\text{ km.}\)

The tunnel length is \(c=2(3960)\sin{\frac{40^\circ}{2}}\approx 2709\text{ miles}\approx 4355\text{ km.}\)

The tunnel depth is \(h=3960(1-\cos{\frac{40^\circ}{2}})=239 \text{ miles}\approx 385\text{ km.}\)

From **Cody to Meridian**, with \(\theta=20^\circ\), the surface distance is \(\frac{20^\circ}{180^\circ}\pi\cdot 3960\approx 1382\text{ miles}\approx 2224\text{ km.}\)

The tunnel length is \(c=2(3960)\sin{\frac{20^\circ}{2}}\approx 1375\text{ miles}\approx 2213\text{ km.}\)

The tunnel depth is \(h=3960(1-\cos{\frac{20^\circ}{2}})= 60\text{ miles}\approx 97\text{ km.}\)

The shorter tunnel is about half as long (and quite close to the surface distance), but ¼ as deep. This is expected, as the cosine behaves like a parabola near its peak. Here is what the two tunnels look like, drawn to scale, echoing Carson’s own drawing:

Earth’s crust extends something like 20 to 30 miles under continents, so both tunnels would go into the mantle. Temperature and pressure would be somewhat uncomfortable!

After writing this, I had a nagging sense that it reminded me of something, and I finally remembered what it was: The Alameda-Weehawken Burrito Tunnel, which I read about a year ago. The story is well worth reading (especially in early April). This incredible high-speed tunnel is a little shorter than our 40 degree tunnel (which might go from San Francisco to Bangor rather than New York City):

The story includes length and depth information, the latter in a graph:

In 1911, the celebrated British civil engineer Basil Mott approached the plutocrat Andrew W. Mellon with an audacious plan to build a straight-line tunnel 2500 miles long connecting New York City with San Francisco, allowing packages to be sent between the two cities using only compressed air and gravity. The tunnel would resemble the pneumatic tube systems that had served New York City and Paris so well for mail delivery, but on an incomparably vaster scale. …

You may initially think that the map is wrong, showing a straight line; but if you look closely, you’ll see that it is simply a different projection, in which this great circle appears straight. Let’s see how the fiction compares to our formulas.

We can repeat our calculations from above:

The central **angle** subtended is \(\theta=\frac{s}{R} = \frac{2557}{3960}=0.6457\text{ rad}=37^\circ\).

The tunnel **length** is \(c=2(3960)\sin{\frac{37^\circ}{2}}\approx 2513\text{ miles}\approx 4044\text{ km.}\)

The tunnel **depth** is \(h=3960(1-\cos{\frac{37^\circ}{2}})=205 \text{ miles}\approx 330\text{ km.}\)

The reported length of 2500 miles is reasonably accurate. The graph above shows a slightly lower maximum depth of about 270 km rather than 330. The story also correctly notes that the tunnel dips into the asthenosphere (which extends from 100-200 km down to about 700 km), which is part of the upper mantle. But I think the temperatures in the graph are underestimates. You don’t really want to go that deep, and the rock is not rigid there.

You may also want to think about the accuracy of the shape of the graph. I’ll leave that as an exercise for the reader.

For a modified and extended version of the story, watch this video. They got the depth about right, and added more science. As for the history … have fun with it!

]]>With few new questions of general interest available this week, I thought I’d go back a few months to a couple little questions on a topic we haven’t dealt with lately, combinatorics. We’ll have one question each on permutations and combinations, showing some subtlety in both the methods we use and the wording of the problem.

Both questions came from Jonathan, in mid-August. Here is the first:

I have taken the following question from the book

Basic Probability, Henk Tijms, 2021.Five football players A, B, C, D and E are designated to take a penalty kick after the end of a football match. In how many orders can they shoot

if A must shoot immediately after C? How manyif A must shoot after C?For the

first questionI have taken C and A to be one player. So the problem becomes one of the permutations of four players, CA, B, D and E. The solution is simple and is 4! = 24.For the

second question, I have broken it down into four parts and added them to obtain the total.So, with C first (Cxxxx), there are 4! = 24 permutations.

With C second, (xCxxx), B, D or E can be first, so there are 3! +3! + 3! = 18 permutations.

With C third (xxCxx), there are 3 ways to choose the first position (B, D or E), two ways to choose the second (one has been consumed in the first position), two for the last two positions (the remaining one of B, D or E occupying one place, A the other). So there are 3 × 2 × 2= 12 permutations.

Finally, with C fourth, as A is last there are 3! ways of filling the first three places. So there are 6 permutations.

Adding these gives 24 + 18 + 12 + 6 = 60 permutations.

The answers to both parts of the question agree with those given in the book.

My question is that my solution to the second part of the question seems clunky and brute-force; I can’t help thinking there is a more terse and elegant way of achieving the answer.

Do you have any suggestions?

His thinking is good. But is there a simpler way for the second problem? We’ll offer two.

Doctor Fenton answered:

Hi Jonathan,

Thanks for writing to the Math Doctors.

Another way to count the possibilities in the second question is to

choose the two places for A and C. There are_{5}C_{2}= 10 ways to do that, and they must be filled in the order C A from left to right. Having chosen the two places for A and C, there are three remaining slots for the other three players, which can be filled in 3! = 6 ways, giving a total of 10*6 = 60 orders for the players.

So here we are first choosing two places to put A and C, and placing them in a fixed order so that C is first (e.g. _ C _ _ A); then filling in the remaining blanks with the other three players, B, D, and E (e.g. D C E B A). There are 10 ways to do the first part, and, for each of those, 6 ways to do the second.

I saw an alternative:

Hi, Jonathan. I want to add another way to do it!

The way I saw the problem, any arrangement will either have A before C, or C before A. In fact, they come in pairs, so

exactly halfwill have A after C. So we can just count all arrangements (5! = 120) and divide by 2.One of the things I enjoy about combinatorics is that there are so often more than one way to solve the same problem — and we need that, because it is so easy to make mistakes, and

checking by doing it two ways increases my confidence.

Of the 120 arrangements, we can pair up each one (e.g. D C E B A) with the reversal, D A E B C). One fits the requirement, the other doesn’t. So there are the same number of arrangements that are allowed as those that are not.

Jonathan replied,

Doctors Fenton and Peterson, many thanks for your answers. Doctor Peterson, I had thought about halving the total but could not convince myself that it was correct as I was concerned what happens when A and C occupy the fourth and fifth slots. Thinking about it again, it makes sense. I am pleased I have confirmation that this more intuitive solution is correct — as you say, it gives one confidence. Thank you both again.

Taking the time to convince yourself that an intuitive feeling is right, is what makes you a mathematician! Too many students either go with whatever feels right (which sometimes works!), or are afraid to try unless they know from the start what to do. Getting an idea and testing it is something we all need practice with.

Here is the second question, submitted the next day:

I have taken the following question from the book

Basic Probability, Henk Tijms, 2021.John and Pete are among 10 players who are to be divided into two teams A and B, each consisting of five players. How many formations of the two teams are possible

so that John and Pete belong to a [?sic] teamI have reasoned thus:

The team with John and Petecan be formed_{8}C_{3}ways, giving 8!/(3!5!) = 56.The answer is given as 112. I can see that this is

double the answer I have, suggesting that the second team needs to be taken into account. But surely, once the first team is chosen the members of the second team are forced.I am concerned I have I missed something fundamental here.

Are you able to put me right?

I have often commented that English can be the hardest part of a math problem. What does “John and Pete belong to a team” mean? In different contexts, it might mean that each of them belongs to a team, or that they belong to the same team, or that they both belong to some particular team. Jonathan has called our attention to the word “a”, which seems to be poorly chosen; but the main difficulty is a little deeper than that.

[Incidentally, in editing this post, I found the problem in the book online, and it actually says,

In my own dialect, “to a *same* team” is ungrammatical, or at least non-idiomatic; I have seen this form used by speakers of other languages, where perhaps there is a similar construction. I understand it here to mean that they are on the same team but we don’t care / know which it is.]

In general, in combinatorics we need to determine what entities are “distinct” or “distinguishable” (e.g. the players, and the teams), and what outcomes are to be counted as “different” (e.g. mere team membership as opposed to order within a team). There is something a little more subtle in this problem.

I answered:

Hi, Jonathan.

The issue here is the subtlety of the wording of combinatorial problems. You’ve observed, I think, that “a team” is to be taken as meaning “the same team”; the wording could have been clearer. Similarly, we need to determine

what differences to take into accountin our answer.The key is that the teams are given

specific names, A and B, which makes themdistinguishable. So having John and Pete on team A is different from having them on team B. So in counting, we need to first choosewhich teamto put them on (2 ways), and then choosewho else to put on that team(_{8}C_{3}= 56). The total is then 2*56 = 112.

Jonathan’s method in effect put John and Pete on one team, chose 3 of the other 8 players to join them, and put the rest on the other team. So he is leaving the teams unnamed, just thought of as “our team” and “the other team”:

J P _ _ _ vs _ _ _ _ _

We need to also assign names to the two teams, in order to do what was asked:

**A:** J P _ _ _ vs** B:** _ _ _ _ _

or

**A:** _ _ _ _ _ vs **B:** J P _ _ _

This is why one of the first things I do when I’m faced with a problem like this is to ask, “Are the

items(here,people)distinguishable? Are thegroups(here,teams)distinguishable?” That often determines how I approach the problem. Also, I often find myself unsure!Here,

everything would change if “a team” were “team A”! It would also change if the teams had not been named, and we were just asked, “In how many ways cantwo teamsbe formed?”

If it were specified that John and Pete were on a specific named team, we would not have to choose, removing that factor of 2; if the teams were not named, we would likewise have no such choice. We would have indistinguishable teams.

Jonathan replied,

Doctor Peterson, Thank you for your answer. I understand this now. It appears to be all in the wording as you say. I the question as given, the teams are distinct. If the question asked, “

How many ways could a team of five be made?“, then 56 would be correct. Is that so?

Here we are only making one team, and ignoring the 5 not on it:

Team: _ _ _ _ _

I responded:

Correct, assuming you are

requiringJohn and Pete to be onthatteam. It would be different if you required thatifeither of them is on the team,thenthe other must also be on it.

That is, his calculation assumes John and Pete are on the one team, which makes this equivalent to his original calculation where he assumed they are on the *first* team:

On team: J P _ _ _ vs not on team: _ _ _ _ _

But if we only require them **either** both to be on the team, **or** both off, then we again have two cases to count:

On team: J P _ _ _ vs not on team: _ _ _ _ _, or On team: _ _ _ _ _ vs not on team: J P _ _ _

Jonathan asked,

Is there a distinction between the two?

I said,

Suppose the problem said this:

John and Pete are among 10 players, from whom 5 are to be picked for

a special team. How many formations of the two teams are possible if John and Peteinsist on being together, so thatifone is on the team,thenthe other must be on it as well?Then the possibilities include the

_{8}C_{3}cases wherebothare on it, together with the_{8}C_{5}cases whereneitheris on it.If

two (indistinguishable) teamswere being picked, there would be no difference, because one team would have both and the other would have neither; but here I’m starting from your question, “How many ways coulda team of fivebe made?”.

Jonathan answered:

Thank you for clarifying this.

I can see the whole area of combinatorics is subtle and vexed and careful reading (and wording) is essential.

It is indeed!

]]>Here is an interesting little question about how drilling a hole affects volume and surface area. We’ll have one answer, and several explanations.

It came from Harkirat, in late October (as part of a discussion starting with a different question):

Hi.

I am trying to help my daughter with her studies. I’m not sure if I have solved this question correctly or not. Can someone check and

confirm if I’ve solved it right or not:My solution based on the logic:

Since cylinder is cut out from the centre, the volume of the solid left after taking out the cylinder will have lesser volume. So,

volume will decrease.Increase in surface area = CSA of cylinder = 2πrh = 2πr * 1 unit = 2πr

Decrease in surface area = 2 * Area of circle = 2 * πr^2 = 2πr^2

Intuitively, I feel 2πr > 2πr^2 =>Surface area will increase.So B is correct.

Is my reasoning correct?

Any help will be appreciated.

It’s obvious that the volume decreases, so some aspects of the problem seem silly. But there’s something tricky, and surprising, about the surface area! (I’ll ignore the fact that the picture doesn’t quite look right – the bottom of the hole shouldn’t be visible at this angle …)

Let’s take a moment to examine his work, which is good. How does the surface area change?

When we drill the hole,

we **remove two circles** from the top and bottom surfaces, and **add the area of the sides** of the hole (which is Harkirat’s CSA, “curved surface area”, which I would call the lateral surface area of the cylinder):

So the increase in area is \(2\pi r h – 2(\pi r^2)\), which, since the height \(h\) is 1, becomes \(2\pi r – 2\pi r^2 = 2\pi (r – r^2)\). Harkirat *feels* that this increase is positive. Is it?

Doctor Fenton answered with some questions:

Your work is correct, but it appears that you are

looking for a justification that 2πr > 2πr.^{2}The circular cross section of the cylinder cut out of the cube fits within the square with side length 1, so what does that tell you about

the diameter (and radius) of the cylinder?As a general fact, if 0 < x < 1 (i.e. x is a

positive number less than 1, then how does x compare with x^{2}? (Remember that you can multiply an inequality a < b by a positive quantity c to obtain ac < bc, which is also a valid inequality.)

The idea he is implying is that we can multiply both sides of the inequality \(x<1\) by the positive number \(x\) to get the inequality \(x^2<x\), which leads to the required fact.

Because of problems submitting that question, Harkirat had already submitted the same question a second time, with the more specific question,

Hi Friends.

Can someone look into my solution and help me understand

how the surface area increases?Thanks

Doctor Rick answered in that thread:

Hi, Harkirat.

I see that you submitted the same problem as a follow-up to another Math Doctor, and he may have some remarks in that thread, but I will give my perspective here.

You are correct that

the volume of the new solid is less than that of the cube(which seems to be what they mean by “decrease”). And that leaves only one option, so unless we are allowed to choose none of the answers, it seems that we must choose that one.

So we can answer the problem with hardly any thought at all. But doing so is not very satisfying!

But

is it true that the surface area increases?That’s the interesting question here. You thought it through, and decided correctly thatthe change in surface area is 2πr – 2πr(where r is the radius of the cylinder). Factoring out 2π, we have ΔS = 2π(r – r^{2}^{2}). The question is now, is (r – r^{2}) greater than or less than zero?Your intuition is that r > r

^{2}, which would make the change in surface area ΔS positive — the surface area increases. I wonder where that intuition comes from? If I take a simple example — say, r = 3 — then r^{2}= 9, and r < r^{2}. This is the intuition many students would have, I suspect — but not you!There is, indeed, something else going on here. Why is my example inappropriate for this problem?

Probably the reason many people would think that \(r < r^2\) is that they are accustomed to squares of whole numbers. The square of 3, 9, is greater than 3 itself, for example. So we tend to think that squaring a number increases it. I am reminded of our post, How Can Multiplication Make It Smaller?.

Harkirat wrote back, explaining his reasoning:

Hi Dr. Rick,

Thank you for your help.

Since the edge of the cube is given as 1 unit and the circular base is much smaller, it means that

the radius of the circular base is in decimals, say 0.3 unit. Then r = 0.3 but r^{2}= (0.3)^{2}= 0.09.Obviously, 0.09 is less than 0.3 and hence 2*pi*r > 2*pi*r

^{2}. That is what I thought was the reasoning behind this answer.Am I correct?

This is good reasoning from an example, nicely focusing on numbers smaller than one, which is the key.

Doctor Rick responded with other ways to do the same thinking:

Yes, and we can do more than trying some

examples(though that is enough to generate anintuitive feelfor what’s happening).Depending on what your child (or you) has learned about algebra, there are

several algebraic wayswe might consider the question of whether r – r^{2}is positive or negative. One thing we might do is tographthe function y = x – x^{2}. (I just changed to variables that are more familiar in graphing — if you’re OK with it, we could stick with r.) Where does that graph go below the x axis?

Here is the graph:

The function \(y=x-x^2=x(1-x)\) is a quadratic opening downward, equal to zero at \(x=0\) and \(x=1\), and therefore positive between them. We’re only interested in positive values of the radius, of course.

Apart from graphing, there are standard techniques for

solving polynomial inequalities, which is another way of looking at what we have here: For what values of r is r – r^{2}< 0?

Such an inequality is often solved by locating the zeros of the function on a number line, and determining its sign in each region between those values (below 0, between 0 and 1, and greater than 1, in this case). That does, indeed, produce the same result as our graph. In fact, since this technique is often carried out by testing a value in each region, Harkirat’s own thinking is closely related: He tried “a decimal”, that is, a number between 0 and 1, namely 0.3, and found the difference to be positive.

Another approach: going back to graphing, you could

graph the two functionsy = x and y = x^{2}. Where does the second graph fall below the first? That’s where x^{2}< x.

This is probably what first comes to my mind. Here is the graph:

We can see that \(x^2\) is below \(x\) between 0 and 1, again.

The result of any of these investigations will be that x

^{2}< x when 0 < x < 1. We can put algebra aside and think about it this way: When you square a numbergreaterthan 1, you get agreaternumber than you started with — because you’re multiplying the starting number by a number (thesamenumber) that is greater than 1. When you square a positive numberlessthan 1, you get a numberlessthan you started with — because you multiplied the starting number by a number less than 1.And in this problem, we know that r < 1/2.

This brings us back to the post on multiplication I referred to above: Multiplying by a (positive) number less than one decreases it. This, too, was probably on Harkirat’s mind. There is a lot of good intuition there!

Harkirat had a further question:

Hi Rick,

Thank you for the detailed explanation. It makes sense.

However, the trouble is that this is an MCQ and hence

has to be answered quicklyand the child hasn’t got the time to go through the extra work. So, I’ll explain and show to her what you’ve suggested but to solve this I’ll give her the reasoning that I’ve gone with. So, she’ll havedeeper understandingas well but will be able to solve such questions quickly too.Thanks a ton!!

That is exactly right.

Doctor Rick answered,

That’s fine. There’s a place for teaching test taking strategies, as long as we don’t ignore opportunities to explore math more deeply.

As I said, if I had to choose fast,

I would just choose B by process of eliminationbased on volume alone, and we don’t need to think about surface area at all. But if “MCQ” means “multi-correct question” rather than “multiple choice question”, andif “multi-correct” includes the possibility of “none correct”, then some further thought is needed, and what you did is reasonable.

There can be a lot of good thought behind a simple question. I suspect the question was not written as carefully as we have thought about it!

After writing this, it occurred to me that a question like this should not depend on the units used to measure the area; whether the surface is measured in square centimeters or in acres, drilling a hole should still increase the surface area. So the result is not dependent on the edges of the cube being 1 unit long. Therefore, the issue is not, ultimately, that the radius is a decimal. So let’s see what happens if we start with a 3-inch cube, or a 5-meter cube, or whatever.

Let’s say the side is \(s\) units, and the hole has radius \(r\) units:

The whole cube has a surface area of \(6s^2\) square units. We remove two circles with a total area of \(2(\pi r^2)\) square units, and add a cylindrical surface of \(2\pi r s\) square units. The new area is $$6s^2+2\pi r s-2\pi r^2 = 6s^2+2\pi r (s-r)$$

Suddenly we can see the answer instantly: Since \(r<s\) (in fact, \(r<\frac{1}{2}s\)), the area is clearly increasing. And the reason is not really that \(r\) is a decimal, but that it is less than the side of the cube, which is also the height of the hole. (The \(r\) we worked with before was really the ratio of the radius to the side.)

]]>Sometimes we have lots of quick questions and a number of long discussions, neither of which seems suitable for a post. This time I’ve chosen to combine two distantly related questions, one recent and one from several months ago, both involving tangent lines to functions.

The first is from Akhtar, in mid-October:

Hi Dear sir,

I want to gain some important knowledge about the following question.

Find the equation of tangent line to the graph of f(x) = 3x – 7 at (3, 2).

We find equation of tangent by y – y

_{1 }= m(x – x_{1}) using this we get y = 3x – 7 (same to given line).Because

every straight line is its own tangent. Thus we get the same equation.But graphically if we draw the line and then draw tangent to it which will

coincide with the line.So

both lines touch at infinitely many points, so how it remains tangent?Please discuss and remove my confusion.

Best regards

Normally, when we find a tangent line to a curve, it looks something like this:

The tangent line just touches the curve at one point. (In fact, that’s what “tangent” means in Latin: “touching”.) But in this problem, the “curve” *is* the line!

How can we call that a tangent?

Doctor Fenton answered:

I think the issue here is what the meaning of a tangent line is. On my first day in a calculus class, the professor asked that question:

What is a tangent line to a curve?Someone offered a common description: a line whichtouches the curve at only one point. The professor then drew a curve like y = x^{2}, and the vertical line x = 0, which does touch the curve at only one point, the origin. But this vertical line is not our intuitive concept of a tangent line, while the horizontal line y = 0 fits ourintuitive idea(and is the correct tangent).

This first attempt is, in fact, used as a definition of a tangent line *to a circle* in geometry; there, it makes sense. But in calculus, we are talking about all sorts of curves, and that definition doesn’t work any more.

Here is what the professor might have drawn:

Both red lines intersect the curve in only one place, but we’d never call the vertical line a tangent.

Then someone offered a modification, saying that the line

must not cross the curve. Then the professor drew a curve like y = x^{3}, and the line like y = 0 (the x-axis), which of course is the correct tangent line – but it does cross the curve!

Here is what he drew:

Now, he could also have drawn this:

This time, the line doesn’t cross the curve *at the point of tangency*; but it crosses *elsewhere*.

So, the definition of “tangent line” can’t be merely that it intersects in only one place, because that can be *true* of *non*-tangent lines, and also can be *false* of tangent lines. And it can’t be that it intersects without crossing, because that can be *false* of a tangent line.

(Incidentally, some of these ideas are touched upon (no pun intended) in the post Tangents Without Calculus.)

So how can we define “tangent” to fit our intuition?

I don’t recall if any other modifications were offered, but the point is that the

onlydefinition of a tangent line whichdoesyield our intuitive concept is that the tangent line to a curve y = f(x) at a point (a, f(a)) on the curve passes through the point (a, f(a)), and has theslopem = lim(x→a) [(f(x) – f(a))/(x – a)].

This is the calculus definition: The tangent line is simply the line through the given point whose slope equals the **derivative** of the curve, which in turn is defined as a **limit**. One important thing about this approach is that it is a **local** concept: It focuses on what is happening to the curve *near* the point of interest, not far away (e.g. whether the line intersects the curve somewhere else). In effect, the tangent line (solid red) is the limit of secant lines (broken red) passing through the point, as the other point (blue) approaches the point of tangency:

Geometrically, the idea is that the tangent line is

the “best” straight line approximationto the curve y = f(x) at the point (a, f(a)). Any line passing through this point has the form y = m(x – a) + f(a). If two lines both pass through (a, f(a)), y = L_{1}(x) = m_{1}(x – a) + f(a) and y = L_{2}(x) = m_{2}(x – a) + f(a), then the difference between the two lines at x is d = (m_{1}– m_{2})(x – a). The size of this difference at a point (x, f(x)) some distance away from (a, f(a)), relative to the displacement (x – a), is |(m_{1}– m_{2})(x – a)|/|x – a|, which is a constant |m_{1}– m_{2}|.

That is, the distance *d* between two lines decreases linearly as we approach their common point, so the ratio of that distance to the horizontal distance is a constant equal to the difference in their slopes:

The red line is the tangent L_{1}, and we can see that hugs the curve more tightly the closer we get to A:

So if the slope m

_{1}is defined by the limitm

_{1 }= lim(x→a) [(f(x) – f(a))/(x – a)] ,then

lim(x→a) [f(x) – L

_{1}(x)]/(x – a) = lim(x→a) [f(x) – (m_{1}(x – a) + f(a))]/(x – a)= lim(x→a) [{f(x) – f(a)} – m

_{1}(x – a)]/(x – a)= lim(x→a) [{f(x) – f(a)}/(x – a)] – m

_{1}= 0.

Here we are looking at the the distance from L_{1} to the curve; this time, the ratio is not a constant, but decreases to zero. This implies a “close fit”. What about other lines?

For any line L

_{2}(x) = m_{2}(x – a) + f(a) with m_{2 }≠ m_{1}, the difference [(f(x) – L_{2}(x)]/(x – a) can be written as[f(x) – L

_{2}(x)]/(x – a) = [f(x) – L_{1}(x)]/(x – a) + (L_{1}(x) – L_{2}(x)]/(x – a)so that

lim(x→a) [f(x) – L

_{2}(x)]/(x – a) = m_{1}– m_{2}.There is only one slope m for which this limit is 0: m

_{1}. That’s whythere is only one tangent line.

Only the one line at the proper slope allows this ratio to approach zero.

This is relevant in higher levels of calculus:

This is also the idea that has to be used to define the

derivative(not just the partial derivatives!) of afunction of more than one variable. It is sometimes called the “total differential” of the function f(x, y) at a point (a, b): the linear function L(x, y) = Ax + By such thatlim((x, y)→(a, b)) [{f(x, y) – f(a, b)} – L(x, y)]/||x – y|| = 0 .

Does this help?

That last bit was likely beyond Akhtar’s current needs, but the whole idea – that the tangent line is the “closest fit” line at a given point – is essential. Calculus is essentially a way to define what we mean by that.

The other question I want to look at came from Amia in mid-July:

Hi Dr Math

I want to check the solution of this problem, and your opinion about

if the inflection point is extra information, not needed to solve the question, andwhat shall I think to find first, the tangent point or c?The strategy for solving the question?

Thank you

Here we are given a line and need to find a curve, within a family of curves, to which it is tangent at an inflection point. This does sound like there might be too much information; commonly, if you had a family of curves (e.g. \(y = x^2 + c\), you would expect just one of these curves to have a given line as a tangent at all:

The spectrum of curves are for \( c=-2,-1,0,1,2\), and \(c=0\) gives the curve tangent to the line shown.

Our family of curves are a little different:

The dashed line is for \(c=0\), and the spectrum shows \(c=1,2,3,4,6\). (We’ll be looking at negative values for *c* later, when I expand the problem.)

Of course, we can’t just draw these graphs to solve the problem; but it does look like only one such graph is tangent to the line, and it just happens to be at an inflection point.

Amia’s work is good. He made a system of equations representing that the given line passes through a point \((x, f_c(x))\), and is tangent to the curve at that point: $$\left\{\begin{matrix}\frac{3-x}{4}=\frac{2}{x^2+c}\\ -\frac{1}{4}=\frac{-4x}{(x^2+c)^2}\end{matrix}\right.$$ He solved for both variables by solving the first equation for \(x^2+c\), which had the effect of eliminating \(c\), and finding two possible values for \(x\), namely 1 and 4; then finding two possible values for \(c\) in each case. Only one of these fit the requirement that \(c>0\).

Amia’s question about whether to find the point or the parameter first is moot; he really found them both together as a solution of the system. But the question about the inflection point is interesting, and the extraneous solutions made me curious to dig deeper.

I answered, having tried solving it algebraically with a twist:

Hi, Amia.

First, you’re right that

the problem can be solved without using all the information; but since the problemrequiresthat the point of tangency be an inflection point, you mustcheckthat it is. You can’t ignore a condition in the problem!

In other words, Amia just found *c* such that the given line is a **tangent** to the curve, but now needs to verify that the point of tangency is a **point of inflection**. Without that check, we might (conceivably) get an invalid solution. We can see from the green graph above that his solution does pass this test visually; to test it algebraically, we observe that the second derivative is $$f ”(x)=\frac{4(3x^2-c)}{(x^2+c)^3}$$ which is zero at \(x=\pm\sqrt{\frac{c}{3}}\). For Amia’s solution, \(c=3\), this is \(x=\pm 1\), one of which is his point of tangency. So he’s good.

Second, there is more to check besides that. In your final steps, you appear to have rejected three of the possible values for c solely on the basis that they are negative. But

you only used the fact that y’ must be the sameon the line and the curve,not that y must be the same. If we ignore the (unnecessary, as it turns out) requirement that c > 0, and check all four values against f(x), we find that both c = 3 and c = -24 work, but c = -5 and c = -8 do not. Alternatively, if you used the equation of f to solve for c, you would get only c = 3 and c = -24 as solutions.

There are two issues here. The important one is that a tangent line must not only have the right slope, but also pass through a point on the line! Amia had used both criteria to form a system of equations and find *x* and *c*, but had not checked that both were satisfied. I checked all four possible solutions (\(x=1,c=3\), (\(x=1,c=-5\), (\(x=4,c=-8\), and (\(x=4,c=-24\), out of curiosity, and the last two weren’t really solutions even if we allowed negative values for *c* – they are extraneous solutions to the system of equations. We’ll see more about that momentarily.

So what do these four “solutions” actually represent? For that matter, what do the graphs look like if we allow *c* to be negative?

Not having graphed this family of functions, I was not yet aware how different the curves are for negative *c*. Here is what they look like, for \(c=-1,-2,-4,-8,-24\), as well as \(3\):

This shows both the real solution and the almost-solution: a point of tangency at \((4,-\frac{1}{4})\) with \(c=-24\), which is *not* an inflection point.

Now I showed some of these details:

But then when you check the

secondderivative, you will find that for c = 3 the lineistangent at the point of inflection, while for c = -24 it isnot. Here are graphs of those two cases (3 in blue, -24 in green), showing that the given line istangentto both, but at different places (x = 1 and x = 4, respectively, the latternotbeing a point of inflection):

So we can still reject \(c=-24\), this time because it is tangent at a place other than the point of inflection (in fact, there is *no* POI for this curve).

But the other two points are rejected for a different reason:

Here are graphs of the other two non-solutions (c = -5 in blue, c = -8 in green):

I included the actual tangent lines (dotted) showing that these curves have the

right slopesat x = 1 and x = 4, respectively, but have thewrong y(specifically, thewrong sign).

How did this happen? Looking back at Amia’s work, we see that in his equation (*”’) he squared the equation derived from the value of *y*, losing information about its sign. This is one of the usual reasons for extraneous solutions, and demonstrates why it was necessary to check that both equations were really satisfied.

So you got the right solution, but failed to check everything that needed to be checked, so you were lucky that you didn’t get an extraneous solution.

If the problem had not specified c > 0, there would still be only one solution, but more ways you could get it wrong.

In fact, they could have dropped either the requirement that *c* be positive, or the requirement that the point of tangency be a point of inflection, and still get a unique solution; but if they dropped *both*, there would be two solutions.

I get the correct answer by doing the “obvious”,

first finding the inflection pointand either solving for x in terms of c or c in terms of x, then considering tangency at that point. Your method is “opportunistic”, using a shortcut you saw, which is a good idea and perhaps saved a little work, but that can sometimes be dangerous, too.In any case, it’s an interesting problem.

Let’s close by solving the problem this way (still allowing *c* to be negative).

If we first find the inflection point, we find, as I did above, that possible locations are \(x=\pm\sqrt{\frac{c}{3}}\). This immediately eliminates negative values of *c* from contention!

Now, letting *x* have this value and seeking a tangent, we can find the curve with the desired slope at the inflection point by replacing *x* in the equation for the first derivative. We find that $$f'(x)=\frac{-4x}{(x^2+c)^2}=\frac{\mp 4\sqrt{\frac{c}{3}}}{(\frac{c}{3}+c)^2}=\mp\frac{3\sqrt{3}}{4c\sqrt{c}}$$ This is equal to the slope of the line, \(-\frac{1}{4}\), when \(c = 3\), and therefore \(x=1\). This is the solution we found before, with nothing extraneous.

But … we haven’t yet checked whether the line is actually tangent to the curve! Does it intersect the curve at \(x=1\)? Yes: The line has $$y=\frac{3-x}{4}=\frac{3-1}{4}=\frac{1}{2}$$ and the curve has $$y=\frac{2}{x^2+3}=\frac{2}{1^2+3}=\frac{1}{2}$$

So, just as in Amia’s solution, we had to do an extra check at the end to make sure our solution met all the requirements. That is typical of an overspecified problem.

]]>