This week I answered a seemingly simple question that can be solved in several different ways when presented as multiple choice, but is rather difficult as a straightforward algebra problem. Trying to guess what the “patient” had done yielded an invalid method that gave the right answer — or was it *really* invalid? Luckily, something about it seemed familiar, and I found the answer in a question from last year, which itself had a complicated history! The result is worth sharing.

Here is the question, from Sinan:

The problem is:

If \(f:\left(-\infty, \frac{3}{2}\right] \mapsto \left[-\frac{29}{4}, +\infty\right)\) is defined by \(f\left(x\right) = x^2 – 3x – 5\), find

asuch that \(f\left(a\right) = f^{-1}\left(a\right)\).A) 1 B) 1/2 C) -1 D) -3/2 E) 5

I found the inverse function and by the equation found the answer as 5. But 5 is not a desired answer according to the information given.

My initial response was to ask for details, but in the mean time I suggested a possible error he might have made, and indicated both the direct method (which turned out to be what Sinan had done), and an easier method by which I got the correct answer:

As always, my first response is, please show

howyou got 5 as an answer! Just telling us what wrong answer you got doesn’t help us help you.Now, I suspect you

mayhave misread the question. It is possible that you solved \(f\left(x\right) = x\), that is, \(x^2 – 3x – 5 = x\), which finds a value ofasuch that \(f\left(a\right) = a\), not \(f^{-1}\left(a\right)\). But you say you found the inverse, so this seems unlikely; and you should have found two solutions, one in the required domain. So I can’t tell what mistake you might have made. Please show some details.The direct way to solve would be to find an expression for \(f^{-1}\left(x\right)\) and solve \(f\left(x\right) = f^{-1}\left(x\right)\). What I did instead, saving the work of finding the inverse, was to apply \(f\) to both sides, so that \(f\left(f\left(x\right)\right) = f\left(f^-1\left(x\right)\right) = x\). So I found an expression for \(f\left(f\left(x\right)\right)\) and solved \(f\left(f\left(x\right)\right) = x\). This gave me a fourth degree equation, which would be unpleasant, except that using the rational root theorem, together with the list of choices, I only had to check three possibilities (by synthetic division).

Does that give you enough ideas?

Sinan responded by showing his attempt to follow my suggestion, together with what he had first done:

Yes by your method the answer is -1.

\(x = y\)

\(x = x^2 – 3x -5 \Rightarrow x = -1\)Here is my wrong solution:

\(f\left(x\right) = x^2 – 3x – 5\)

\(f\left(x\right) = \left( x – \frac{3}{2}\right)^2 – \frac{29}{4}\)

\(f^{-1}\left(x\right) = \sqrt{x + \frac{29}{4}} + \frac{3}{2}\)

\(\left( x – \frac{3}{2}\right)^2 – \frac{29}{4} = \sqrt{x + \frac{29}{4}} + \frac{3}{2}\)I tried the five choices, one by one. When I chose \(x = -1\), \(f\left(x\right) \ne f^{-1}\left(x\right)\).

It turned out that his original work was not what I thought, but was close to being correct; he took the “direct way” I’d described, but in finding the inverse, he neglected to choose the negative sign on the radical due to the fact that the domain of *f* was restricted to the *left* side of the parabola. Then he checked each of the five choices. The only reason I didn’t do this was my bias against using given choices by trial and error rather than actually solving the problem mathematically. (I don’t like to assume that the choices include all correct answers.) But what he did was undoubtedly the easiest way to solve it, given that it is a multiple-choice problem.

His equation should have been:

\(\left( x – \frac{3}{2}\right)^2 – \frac{29}{4} = \frac{3}{2} – \sqrt{x + \frac{29}{4}} \)

Solving this radical equation algebraically yields the same quartic equation I got using \(f\left(f\left(x\right)\right) = x\), namely \(x^4-6x^3-4x^2+38x+35 = 0\). The rational root theorem leads us to try -1 and 5, and the result lets us find the other two solutions, \(1 \pm 2\sqrt{2}\). Of these four solutions, only the first satisfies the original problem.

Sinan just checked the given choices, and found that -1 is the one that works:

LHS: \(\left( -1 – \frac{3}{2}\right)^2 – \frac{29}{4} = \left( – \frac{5}{2}\right)^2 – \frac{29}{4} = -1\)

RHS: \(\frac{3}{2} – \sqrt{-1 + \frac{29}{4}} = \frac{3}{2} – \sqrt{ \frac{25}{4}} = -1\)

Note an interesting fact here: both sides of the equation happen to equal *x* itself. This will be important!

But in his attempt to do what I said, he instead did what I said *not* to do. So after explaining what I just showed, I continued:

Your new work is

notthe work I said to do; it is what I said youmighthave done, which is technically wrong: “It is possible that you solved \(f\left(x\right) = x\), that is, \(x^2 – 3x – 5 = x\), which finds a value ofasuch that \(f\left(a\right) = a\), not \(f^{-1}\left(a\right)\). ” That is not what you were asked to do.I considered saying that this is a poorly written problem, because taking an unjustified shortcut like this happens to give the right answer, but I chose not to reveal that.

But I was getting curious: the “invalid shortcut” I had initially assumed Sinan had used by misreading the problem actually gave the correct solution; could it really be valid? I wanted to be sure. Examining the graph of the function and its inverse, I saw the fact that the solution lies on the line \(y = x\):

If that is *always* going to be true, then it will be valid to solve \(f\left(x\right) = x\) rather than \(f\left(x\right) = f^{-1}\left(x\right)\).

But when I had initially graphed the functions, I had not taken the trouble to include the domain restriction; I had seen this:

Here we see four intersections, only two of which lie on \(y = x\). (One of those, by the way, is the wrong solution Sinan had first found, by taking the wrong part of the inverse function.) But on closer inspection, the two other intersections are not between a function and its own inverse, but between the positive half of one relation and the negative half of the other.

I tried a couple more functions, and the same thing happened. So I was wondering it my shortcut were actually valid. I wouldn’t want to call a valid method invalid. Maybe the conjecture is true as long as we are talking about a one-to-one function and its inverse …

But then I realized that this question seemed familiar, and vaguely recalled that the answer involved functions that were monotonically decreasing. Searching my records, I found the following (unarchived) answer from October, referring back to a previous unarchived answer that had been wrong. This is getting complicated!

Question [10/14/2017, from Fida]:I did a bit of experimenting and I came to the conclusion f(x)=f^-1(x) if there is a solution, then the point of intersection lies on the line y=xAnswer:Actually, it is NOT true that the point of intersection must have y=x! It is easy to convince oneself (short of an actual proof) that it should be so, but here is one counterexample: f(x) = -x^3 f^-1(x) = -x^(1/3) f(x) = f^-1(x) ==> -x^3 = -x^(1/3) x^3 = x^(1/3) (x^3)^3 = (x^(1/3))^3 x^9 = x x^9 - x = 0 x(x^8 - 1) = 0 x = 0 or x = 1 or x = -1 y = 0 y = -1 y = 1 So there are three real solutions, but only one of them lies on the line y=x. You can see what is going on when you graph the two functions; the intersection points are reflections of one another. It is VERY easy to come to the wrong conclusion; looking back, I see at least two places where Math Doctors in the past have agreed with you. Here is a question and answer from 9 years ago about this, referring to an answer given the previous year that was not archived, but was posted by its recipient on a Greek discussion group, in support of one person's belief about the problem:

Question [6/25/2008, from Nikos]:Suppose that f is an invertible real function of one variable. It is known that the graphs of f and f^{-1} are symmetric with respect to the line y=x. It is also known that if f is increasing then the common points of f and f^{-1} (if there exists) are on the line y=x. My question is: What can we say when f is decreasing? The intersections of f and f^{-1} only lie on the y=x or is it possible to meet at points not belonging on the y=x? Example f(x)=-x^3. I am a mathematician but I really want your help. In my opinion the graphs of f(x)=-x^3 and its inverse meet at 3 points. (-1,1) (0,0) (1,-1). But I get confused when I read at http://forum.math.uoa.gr/viewtopic.php?t=984&sid=865caa471f0db868f5826a68b1d1dcd2 the following:

Question [11/16/2006, from Bilstef]:Let function f, which is invertible. Then which are the points where f and f^-1 meet? How do we find them? By solving only the f(x)=x? What happen if f is increasing and what if decreasing? Are the meeting points only on the line y=x?Answer:Hello, You're correct, the place where a function and its inverse intersect must always be on the line y = x. Do you understand why? Please write back if you would like an explanation. - Doctor Gaff, The Math Forum

thank you in advance Nikos FotiadesAnswer:We put answers we think are worth sharing with others in our archives; sometimes people write to us when they find an error there, and we correct it. We are only human, and are grateful when we are corrected. Evidently this answer, which we did not archive, was posted on a site we do not control, so we can't correct it. That is unfortunate. This is simply a wrong answer, and we have since answered the same question more correctly when others asked something similar. It is not true that a function can only intersect its inverse on the line y=x, and your example of f(x) = -x^3 demonstrates that. There are many others, of course; these include functions that are their own inverse, such as f(x) = c/x or f(x) = c - x, and more interesting cases like f(x) = 2 ln(5-x). The correct statement is that f will intersect f^-1 at any x for which f(x) = x, but may intersect in other places as well. I presume you have a proof that a monotonically increasing function can intersect its inverse only on y=x; I don't think that is hard to prove. Clearly, it is not true for a decreasing function. A mathematician knows not to trust what anyone says without proof; your counterexample shows that he can't have a proof, so you can ignore (or correct) that hastily made claim. - Doctor Peterson, The Math Forum

Here is the subsequent answer I referred to; I later realized that it appears to be the same “Bilstef” under a different name, so we had given him both the wrong, and then the correct, answer:

Question [12/2/2007, by Vasilis]:let f(x) = - x^3+1 then f^-1(x)= (1-x)^(1/3).were do their graphs intersects Are their graph intersect only on the y=x and why? Their graphs thru Graphmatica programme show that they intersect also in (0,1) and (1,0). Is that correct? Why? How do we find where the graph of f and f^-1 intersect?Answer:Hi, vasilis -- The graphs of any function and its inverse will intersect on the line y=x. But the graphs can intersect in other places also. In your example, the two graphs intersect on the line y=x and also at (0,1) and (1,0). - Doctor Greenie, The Math Forum <http://mathforum.org/dr.math/>

Finishing the answer to Fida:

Both Dr Gaff and Bilstef subsequently attempted to prove that y=x, but both proofs have major gaps. Both undoubtedly tried to visualize intersecting graphs and thought they saw an impossibility, so that they didn't think they had to be careful with their proofs, because it seemed obvious. Clearly, this is a tricky problem! What is happening is that f(x) = f^-1(x) will be true for any x such that f(x) = x; but more generally it is true whenever f(f(x)) = x as you can see by applying f to both sides of the original equation. For this, it is not necessary that f(x) itself be x.

As you can see, Doctor Gaff misspoke; Doctor Greenie then recognized the correct answer; then I was almost convinced but realized my error. This illustrates why mathematicians don’t like to speak off the cuff and say something they think is right but haven’t proved. This is also why I took the time to pursue this question, having actually written and then deleted comments that suggested the “shortcut” might be right.

To tie up a loose end, here is the graph of the classic counterexample:

Clearly, the solutions of \(f\left(x\right) = f^{-1}\left(x\right)\) are *not* the solutions of \(f\left(x\right) = x\). So I was right: we *can’t* use the latter to solve the former (unless *f* is an **increasing** function).

Having recently discussed a couple different issues that touched on the relationship of math to reality, I was reminded of this old favorite – a question that is not asked often enough, and reveals some of the “dirty little secrets” behind math. Math is not reality; it is often used to *model* reality, but as such is usually just an approximation. We need to determine in some way whether the math we use is applicable. Or, to put it another way, math has its *own* reality, which may or may not match up with the uses to which we put it.

Recall that probability is defined, at a basic level, by counting the number of equally likely outcomes (such as the six numbers you can get by rolling a die), and dividing the number of “successful” outcomes by the total number of outcomes. It is important that you know the outcomes you count are in fact equally likely.

But how do you know whether they are?

Here is the question, from Chris in 2008:

How Do You Know That Events Are Equally Likely? How do you determine whether the events of a problem are equally likely? I can't seem to find any information regarding how to determine if events are equally likely. Most texts don't get into it at all and others use circular logic to show it. For example, the events of a fair coin toss are equally likely because they each have a probability of 1/2. But you can only use that calculation once you have determined that the events are equally likely. How do you make that determination?

Textbooks probably gloss over this because they think it would be too complicated to explain, and perhaps because it might shake students’ confidence in math. But I think it is important to have an accurate picture of where math fits in. I gave three different answers to the question.

You may have noticed that in many math problems they will say "Assumethe coin is fair", or something like that. This may seem like cheating, but it is really what math is all about: reasoning from "axioms" (basic assumptions) that define the subject of our reasoning. Whether the assumption makes sense is the subject of the second answer, but we're not ready for that yet! In math, we commonly start with a model of some concept in the world, such as a fair coin or a flat surface; rather than deal with all the complexities of real coins or surfaces, we think about what would be true of an IDEAL coin (heads and tails are equally likely, and there's no other option like standing on edge) or plane (you can draw one line through any two points, etc.). Then we reason based on those assumptions, so that all our conclusions will be definitely true IF those assumptions are true.

Math deals with idealized situations. We *want* that coin to be “fair”, because that makes it easier to work with; so we *assume* it is when we do our calculations – creating for ourselves this ideal imaginary world in order to reason about it. Math can’t work without assumptions, because it is essentially nothing more than logical reasoning applied to assumptions (axioms).

Anyrealcoin is likely to be a little biased in one direction or the other; and there will not be exactly as many boys born as girls. But experience tells us that it is reasonable to assume, for many purposes, that an ordinary coin islikelyto be very close to fair, and that one is about aslikelyto have a boy as a girl. So we make those "simplifying assumptions" when we don't need extreme precision in our answers. Sometimes we do want to be really sure of our answers (in the real world--maybe because real money depends on it); then we don't just go by generalEXPERIENCE, but by carefulEXPERIMENT. We toss a thousand coins thousands of times each under controlled conditions and determine just how close to fair the average coin is, and how far from fair any given coin is likely to be. This field of study is called statistics, and it can provide the basis for exact calculations of real probabilities--as far as we know, and as long as the population of coins or children we are working with matches the one we studied. Again, we can never be exactly sure ...

We want our results to be reasonably *close* to reality, so we choose to make reasonable assumptions. So the ideal world we work in is intended to be a *good* model of the real one. When it matters, we check whether the results of our assumptions match with observation — everyday experience when that’s enough (as in a classroom), and precise experiment when we need more assurance (as in a lab).

The third answer is that in problems you are given, you are expected to choose equally likely outcomes based on a combination ofstandard assumptionsthat have been presented in your text or elsewhere,common sense, and your knowledge of probability. For example, when you toss two coins, you either have beentold to assume they are fair coins, or youknow from experience or from statisticsthat they are close to fair, so it makes sense to consider heads and tails on EACH INDIVIDUAL coin as equally likely. You also know enough about probability to realize that that assumption would conflict with the easier assumption that "no heads", "one head", and "two heads" are equally likely. In particular, you have learned that compound events such as this can't be assumed to be equally likely, but simple events (like a single coin) often can. This becomes a sort ofintuition: you've seen it happen enough that you will be much more willing to assume thatsimple eventsare equally likely than thatmore complicated thingsare. You break things down to the simplest possible parts, and then decide whether it makes sense to suppose that those are equally likely.

So in a problem, if you are not directly told what outcomes are equally likely, you go to the simplest events you can find (those from which compound events can be built), and ask yourself whether common sense, experience, or previous statements in the book make it reasonable to assume it. (The assumption that no heads, one head, and two heads are equally likely is wrong, because if each coin is fair, then outcomes HH, HT, TH, and TT are equally likely, so that the probability of one head is 2/4 = 1/2, not 1/3.) In real life, where it is not a teacher but your results that will judge your correctness, you would look at statistics to confirm your assumptions.

One more comment:

I should note one other thing: It is possible to solve problems WITHOUT any equally likely outcomes, and higher level study of probability does just that. The idea of equally likely outcomes just makes it easy to explain the basic concepts, and to solve problems. If you are told that a coin had a probability of 0.30 of heads and 0.70 of tails (as for that spinning penny), you can just take THAT as your assumption, with no actual equiprobable events in sight.

So, the initial question came from a context where probability is defined in terms of equally likely events; but once we get past the basics, we can work with probabilities in a more general way, starting with axioms that do not require equally likely outcomes. This gives us a lot more flexibility. For an introduction to that, see this page:

Probability Axioms and Theorems

To close, I want to look at an answer to another question (from Kelly in 2010) that relates to the same topic:

Equally Likely vs. Equally Possible When a baby is born, it is either right-handed or left-handed. Are these possibilities equally likely? From what I know, most babies are right-handed, but it seems there is still a 50/50 chance they could be born either. I'm actually a parent trying to understand my son's homework. There are several examples that seem clear enough, like rolling a number cube. But another one says that the Pittsburgh Steelers play a game -- and either win, lose, or tie. As with the question about which hand will be a baby's dominant one, it seems "unlikely" that a professional football game will end in a tie; but the question is to decide and explain whether the possible results are "equally likely." I can't explain this.

This is an example of how textbooks try to help students understand the importance of equally likely outcomes in the (elementary) definition of probability. It is meant to relate to common sense, but I imagine a lot of people, like Kelly, feel uncomfortable using common sense in talking about math. What do we really mean by equally likely? Sound familiar?

I gave a brief answer, and referred to the discussion above:

"Equally likely" is sometimes implicit in our definition of a problem (like the die, or "number cube," which isdesignedto make each outcome happen as often as any other), and sometimes dependent entirely on ourempirical observations. It's conceivable, I suppose, that a pro football team might win 1/3 of their games, lose 1/3, and tie 1/3; but certainly there is no reason toexpectthat! "Equally likely" is about what we expect, not what might happen. The main idea here is to make students stop and think before they use a set of outcomes as the basis for probabilities. The mere fact that there are 3 possible outcomes does not mean that each has probability 1/3; you need to have some basis for supposing that this is true. Sodon't overthink the questions; if it's at all questionable that a set of outcomes are equally likely, they aren't!

In summary: if you’re told to assume something, assume it; if it makes a big difference, check it; if it makes sense to expect it, expect it; and if you’re just in a class, relax and go with common sense.

]]>Here is a question that will serve as a good overview. Sarah in 2013 asked,

Extraneous Routes I know that radical, log, absolute value, and fractional operations sometimes introduce extraneous roots. When else do I need to check for them? Is it only by an equation?

In case you are not familiar with the concept, an extraneous solution (also called an extraneous root) is a solution you get in the process of solving, that turns out not to be a solution of the original equation. It is *not* actually a solution, and is not inherent in the problem itself, but is introduced by what you do.

I sometimes illustrate the idea by imagining that we work in a lab analyzing blood samples. Suppose we add some reagent to the sample in order to determine, say, whether there is any arsenic in the blood, and then (still using the same sample) we do a test for some other class of poisons – but the reagent we used for the first test is one of them. Then we will find that chemical in our sample, not because it was originally there (though it may have been), but because we put it there! We need to know that we* introduced it* into the sample, and either ignore it or do some other test to see if it should be included on our report. In the same way, we need to pay attention to what we have done that may introduce an extraneous solution, so we can check for it in the end.

Sarah’s question was a broad one, so I gave a quick survey of the concept, emphasizing that it is not primarily the *type of equation* that requires the check (though that is a good clue), but *what you do* in solving it:

You have to check for extraneous roots whenever -- in the process of solving -- you have done something that is not guaranteed to produce an equivalent equation. I wouldn't want to claim to give a complete list; any time you use a technique you haven't used before, you should determine for yourself whether it falls in this category. But the most familiar of these aremultiplying by an expression containing the variable, which you do in solvingrational equations, and which might result in unintentionally multiplying by zero;squaring or raising to another even power, which you do in solvingradical equationsor equations withfractional exponents, and which loses information about signs; andsimplifying logarithmic expressions, which can change the domain of an expression. Certain things you do in solvingabsolute valuescan also do this, though in many cases this can be handled by paying close attention to conditions rather than just checking at the end. What matters most is not the kind of equation you are solving so much as the things you do in solving it. For example, sometimes people solve absolute value equations by squaring -- that falls under one category I listed. Other people will do something different that does not carry any risk.

Let’s look at each of these main types, to fill in some details.

Suppose we start with the equation \(x = 3\), and then (just for fun) multiply both sides by *x*. What do we get? \(x^2 = 3x\). If you solve this new equation (be careful!), you find that it has two solutions, 3 and 0. The original equation had only one solution, 3 (obviously). So the new equation is not equivalent to the original; the new one has an extra solution, 0. Why? Because when you multiply by 0, both sides become 0, and what may have been a false equation is now a true one. So any value of *x* for which the multiplier is zero will look like a solution, whether or not it was.

Of course, we wouldn’t multiply that equation by *x* to solve it; but we *would* do this to solve a **rational equation** like \(\frac{1}{x} + 2 = \frac{x + 1}{x}\); after multiplying by *x*, we have \(1 + 2x = x + 1\), and we find that the solution is \(x = 0\). But if *x* is zero, then we multiplied by zero, so of course that will appear to be a solution! In fact, if we check \(x = 0\) in the original equation, we find that both sides are undefined, which means that 0 is not a solution.

Note that there is something extra happening here: When we multiplied by *x*, we also *changed the domain* of the equation, by eliminating the denominator. In fact, this is what we really need to check for: If our claimed solution is not in the domain of the original equation, it is extraneous, and must be ignored.

Suppose we start with the same equation \(x = 3\), and then square both sides. This time we get \(x^2 = 9\), which has two solutions, 3 and -3. Again, the new equation is not equivalent to the original; we introduced the extra “solution” -3. Why? Because when you square, you lose information about signs. If we had started with \(x = -3\), we would have ended up with the same equation. So when we solve the squared equation, we are actually solving *both* original equations at once; our “solutions” may be solutions of one, or the other, or both, and we don’t know which until we check.

This typically happens when we are solving a **radical equation**. As an example, we might solve \(\sqrt{x + 6} = x\) by squaring, which yields \(x + 6 = x^2\). The solutions of this new equation are 3 and -2. The first of these, 3, is in fact a solution of the original equation: \(\sqrt{3 + 6} = 3\). But -2 is *not* a solution: \(\sqrt{-2 + 6} = 2\), not -2. What happened? We actually solved the negated equation, \(-\sqrt{x + 6} = x\), of which -2 *is* the solution!

There is more going on here, too. Sometimes (though not usually), the equation is undefined because the radicand becomes negative; an example of this is \(\sqrt{2x + 1} = \sqrt{x}\). Squaring yields \(2x + 1 = x\), whose solution is \(x = -1\). When we check this, we find that -1 is not in the domain of the equation, so that rather than being a solution of the negated equation, it is not a solution of either. The usual problem, as in the first example above, can also be described as a *range* issue: because the radical symbol represents only the *positive* square root, it is this restriction on the range that caused our -2 not to be a solution.

Here is an example from 2001 where a student discovered this for himself:

Extraneous Roots

Domain issues are the usual culprit in logarithmic equations. Here is an example: \(\log{x} + \log{x+3}= 1\). If we simplify (condense) the left side, we get \(\log{x^2 + 3x} = 1\) and then \(x^2 + 3x = 10\), whose solution is \(x = 2, -5\). But because of the domain of the log, the negative solution is extraneous: \(\log{2} + \log{2+3}= 1\) is true, but \(\log{-5} + \log{-5+3}= 1\) is not. The issue here is that condensing a logarithmic expression (or some other types of expression) can change the domain. For more on this, see

Are Properties of Logarithms Missing Something?

Hidden in the details above is an important fact: each kind of extraneous solution has its own specific test. Students too often miss the fact that when they check a solution, it can fail for two very different reasons: It may fail because it is an extraneous solution that we introduced by our work (in which case the check is an essential part of the work itself); or it may fail just because it is wrong, a result of a mistake in our work. If it is **extraneous**, we can just ignore it (and say there is no solution, if we found no non-extraneous solutions); if it is **erroneous**, we have to go back and fix our work.

So when I teach about this, I always show how to distinguish an extraneous solution from an erroneous solution.

Here is a question from 2012:

Extraneous Roots Checked Less Tediously Given (x - 4)/x + x/3 = 6 I solved this for x and found two roots: x = [15 - SQRT(273)]/2 x = [15 + SQRT(273)]/2 Can anybody help me check whether one of these roots is extraneous or not? Plugging in these values of x and satisfying the left hand side and right hand side would take too long. For simple values of x, like 2 and 3, it's easier to find the extraneous roots. But values of x that have two parts and contain square roots, as in my case, cause difficulty in finding the extraneous roots. Other than plugging in values, is there a simple way of checking for extraneous roots?

Muhammad knew the need to check the solutions, but the check here seemed too hard, since the solutions were ugly expressions. I gave him two pieces of advice that can simplify the checking process. First, since any extraneous solution to a rational equation will fail by making terms undefined, he only needs to check that:

In this kind of equation, the source of extraneous roots is multiplication by x, which can introduce an extraneous root if x = 0 turns out to be a root of the new equation, because multiplication by 0 does not produce an equivalent equation. To put it another way, an extraneous root will prove to be extraneous only by making a denominator of the original equation zero. So all you need to do to test for an extraneous root is to make sure each solution is in the domain of the equation -- that is, none makes the denominator zero. In this case, it is clear that they don't.

Second, checking is also needed in order to make sure you didn’t make a mistake; but there, the accuracy of a calculator is good enough; you don’t need exact verification unless you are told to:

The first root, for example, is approximately -0.76135582. I would store that value in my calculator to avoid having to retype it three times, and use it in the equation: (x - 4)/x + x/3 = (-0.76135582-4)/-0.76135582 + -0.76135582/3 = 6.25378527 + -0.25378527 = 6 If this had come out to 5.99999999, I would consider it verified!

I also gave a warning that many students need to hear: Once they have learned about extraneous solutions, they often start writing “no solution” whenever a check fails, even if it is a linear equation that can’t possibly be extraneous! Their knowledge of extraneous solutions seems to have displaced what they formerly knew, that they themselves are the most common cause of failed checks. Only ignore failed solutions that you *know* are extraneous.

Here is a question from 2017 about recognizing extraneous solutions caused by squaring:

One Variable in Two Radicals Solve for x: sqrt(2x - 5) - sqrt(x - 2) = 2 I tried squaring each term individually and then squaring the 2, but my roots are not the roots in the solution. ...

Here Phinah had several different issues, so I went through the whole process of solving; but at the end (before being asked) I pointed out how checking works here:

I'll mention one extra thing: when you are checking your answer, you can recognize an extraneous root (as opposed to an error due to, for example, an arithmetic mistake) if it satisfies an equation obtained by changing the sign of a radical. In this case, when you check x = 3, you get sqrt(2x - 5) - sqrt(x - 2) = 2 sqrt(2*3 - 5) - sqrt(3 - 2) = 2 sqrt(1) - sqrt(1) = 2 This is false. But it becomes true if you change a sign to sqrt(1) + sqrt(1) = 2 So this is actually a solution of sqrt(2x - 5) + sqrt(x - 2) = 2 This is indistinguishable from the given equation after squaring. That is why the extraneous solution arises.

This is typical: If the check fails, see if changing the sign on one or both radicals would make it correct. If so, then the solution is extraneous and you can quietly cross it out; but if not (if you got, say, \(\sqrt{3} – \sqrt{2} = 2\), which is simply false), then you have to find the error, because your work was wrong.

There are some other things worth discussing that are related to extraneous solutions, but I will just provide links:

Here I offer an alternative (which I had never thought of previously) that simplifies the check in some cases:

Avoiding the Final Step of Checking for Extraneous Solutions

Here I discussed the opposite issue, where you can lose a valid solution rather than find an invalid one:

Root Propagation and Loss

And here I discussed similar issues in trigonometric equations:

Extraneous Roots Introduced Trigonometrically Algebra for Equivalence]]>

Occasionally we get questions challenging the correctness of a textbook problem, or of a test grade. And sometimes we get questions about mathematics used in sciences like physics or chemistry, which lead us to explore unfamiliar fields. The most interesting question this week is one of these. It is one where we first had to gather necessary information, and then two of us worked together to figure out what was going on. This is not something at which we are experts; but sometimes watching someone fumble around in an unfamiliar area can be instructive.

Here is the question, from Fida; I will be editing the whole discussion to save space:

I found this equation: \(k = A e^{\frac{-E_a}{RT}}\).

Why does the graph between k and T look like that? (See attached picture please) because \(k = c e^{\frac{-1}{T}}\) if we summarize the other variables as c.

And the graph does not look like that of \(y = c e^{\frac{-1}{x}}\).

Maybe I am missing something?

The picture was a rough sketch of an exponential function.

My initial question is, why does Fida think the graph is exponential? Is it his own idea, or did someone tell him that? Clearly the function does have a form similar to the exponential-reciprocal function he compared it to, whose graph, which I looked at in Desmos, looked nothing like the exponential.

We needed more information. I searched for the equation, and found that it is Arrhenius’ equation, used in chemistry. The first page I found with graphs, LibreTexts, had a graph that could match a small part of the exponential-reciprocal (though it had a logarithmic vertical scale), but looked nothing like the exponential. Here are the three graphs under consideration:

So I asked Fida for the source of the information: Where did you see such an exponential graph, and what units and numerical values were shown, so we could try to replicate it?

In response, Fida gave the URL of this chemistry test. Question 3 asks, “Which graph shows how the rate constant of a reaction, *k*, changes with temperature?”, and the supposedly correct graph is exponential, as shown at left above.

Now we have a context: a chemistry test is asking a *qualitative* question about variation of the rate constant with temperature, and answers it with a generic sketch with no indicated units (so we can’t tell whether temperature is in K or C, etc.). If they are right, then the graph they show must represent only a small part of the graph. In fact, if we ignore negative values of T (which is absolute temperature), and blow up the vertical scale, we see that this is reasonable:

When *x* < 0.5, the graph looks similar to an exponential. I did a little more research and responded:

Clearly they are claiming that the rate k increases exponentially with temperature, and that you are expected to know this. So we want to figure out how this relates to the Arrhenius equation, which they evidently are not expecting you to think of. I’m not a chemist, but this intrigued me enough mathematically to look into it more deeply than I ordinarily might!

I looked at what Wikipedia says about the equation.

In the introduction, they say,

A historically useful generalization supported by Arrhenius’ equation is that, for many common chemical reactions at room temperature, the reaction rate doubles for every 10 degree Celsius increase in temperature.

This describes exponential growth, but is not obviously related to the equation. This rule of thumb must be what your test’s answer is based on – assuming that students have learned this, rather than the Arrhenius equation.

They show a graph that looks exponential, with the caption, “In almost all practical cases, E

_{a}>> RT and k increases rapidly with T.” This tells us that the exponential growth is an approximation applicable at normal temperatures — presumably an approximation that could be derived from the equation over a limited domain.Then they have another graph that looks like what you and I expect, with the caption, “Mathematically, at very high temperatures so that E

_{a}<< RT, k levels off and approaches A as a limit, but this case does not occur under practical conditions.”So the approximation presumably applies in the part of the graph where it is concave upward. The article also mentions that the “constants” in the equation may actually depend on temperature, so it makes sense to consider only a small range of temperatures.

I find the same sort of information elsewhere, such as this site. At the bottom they say this, after going through a numerical example:

You can see that the fraction of the molecules able to react has almost doubled by increasing the temperature by 10°C. That causes the rate of reaction to almost double. This is the value in the rule-of-thumb often used in simple rate of reaction work.

Note: This approximation (about the rate of a reaction doubling for a 10 degree rise in temperature) only works for reactions with activation energies of about 50 kJ mol

^{-1}fairly close to room temperature. If you can be bothered, use the equation to find out what happens if you increase the temperature from, say 1000 K to 1010 K. Work out the expression -(E_{A}/ RT) and then use the e^{x}button on your calculator to finish the job.The rate constant goes on increasing as the temperature goes up, but the rate of increase falls off quite rapidly at higher temperatures.

If you want to pursue this further (I would if I were you!), you might look into typical values for the variables and constants (as in that last page’s example) and make a graph such that you can zoom into the typical temperature region and see if it looks exponential, while zooming out shows the full shape of the graph. Then you might think about various approximation methods, to see if there is a way to demonstrate this mathematically. You may not be ready for that level of detail; it probably needs some more advanced calculus than you have seen.

So we have confirmation that this is an approximation that is valid only in a limited range of temperatures, which happen to be those most often encountered. The test expects students to have learned this rule of thumb, and not, like Fida, to think of Arrhenius’ equation itself, which leads to a very different answer (D rather than C). This is something I don’t like to see in tests: smarter students are penalized for knowing enough to be confused by a question.

I didn’t have time to dig more deeply into the approximation, but Doctor Rick took up the challenge:

Hi, Fida. This problem puzzled and intrigued me, too; I spent some time looking into it myself, and Doctor Peterson’s last remarks gave me further ideas. Let me share a few of my thoughts.

First, when you reduce the Arrhenius equation \(k = A e^{\frac{-E_a}{RT}}\) to \(k = c e^{\frac{-1}{T}}\), that isn’t quite right; it can’t be reduced to a single-parameter family of curves. But if we define

y=k/A

x=RT/E_{a}then we get \(y = e^{\frac{-1}{x}}\). Thus it’s a two-parameter family obtained from this parent function by scaling both the x and y axes.

Now, you can apply your calculus skills to that parent function; you’ll find that it is concave-up (and therefore more like the “correct” answer to your problem than any of the others) for x < 1/2. Above that, it’s concave-down, with an asymptote at y = 1. These are things I believe you should be able to do for yourself, but I’m supplying answers so you can check your work if you want to do it.

Now I have a suggestion as to how to do what Doctor Peterson suggested, seeing if we can confirm the rule of thumb (doubling of

kfor 10° increase inT). Take the natural log of both sides of the function above:ln

y= -1/xNow, you may have learned how to find a linear approximation to a function in the vicinity of a given point — find the slope of the curve at that point, and use the slope and the coordinates of the point to write the equation of the tangent line. Remember that we’re now taking ln y as the ordinate (you might want to change y above to, say, z, and let y = ln z).

Take

E= 50,000 J/mol-° as suggested in the source Doctor Peterson quoted, and_{a}T_{0}= 300. Find the corresponding value of x, and use that to write the specific linear approximation aboutT_{0}. You will find that the linear approximation for y = ln z approximates z (or k) as an exponential function of T. What is the doubling interval for that function? I get close to 10 degrees!

After a little more discussion, he filled in some of the details he “left for the reader”:

We don’t have the correct values yet, but let’s just say the equation is

y=mx+c. Thisy= lnz(where z =k/A). Thus,ln

z=mx+c

z=e^{mx}^{ + c}=e^{c}e^{mx}Do you see how we now have an

exponentialapproximation to the Arrhenius equation, in the vicinity ofT= 300°? That’s exactly what we were looking for. And from your knowledge of exponential functions, you can relatemto the change inTthat will doublek.

We managed to derive the approximation from the Arrhenius equation, showing that it is not a mere guess. I hope this also shows the value of approximations: for typical situations, the complete equation is complicated and misleading; the approximation is much more usable. Yet the approximation alone would be wrong, as it would imply very incorrect behavior for very large temperatures. This is often how math works in science.

That was quite a journey, from a seemingly simple question about a seemingly wrong graph, to some useful calculus!

After writing this, I was curious about the details of the approximation – I wanted to do that zooming in I’d suggested. So I found the approximation near 300K, and graphed it along with the actual curve. Then I made additional approximations near 1000K and 2000K, and graphed them all. Here are the zoomed-in graph, then zoomed out:

The red curve is the Arrhenius equation; blue is the exponential approximation near 300K, which looks good when zoomed in, but totally useless beyond 500K or so, and green and purple are the other approximations. It turns out that the approximation has nothing to do with the “exponential-like” appearance (concave up) of the curve we see in the exact graph; the approximation is just an exponential that happens to touch the curve at a particular place. It is a very rough rule of thumb, and not at all a representation of reality, which is not exponential at all! And yet, it has been found useful in the real world, where temperatures do not vary wildly.

]]>Recently I discussed the definition of the median of a data set, pointing out how it needs refinements that are not often discussed. In searching for questions in our archive on that topic, I ran across a discussion of an opposite issue: the breadth of the general term “average”, which does not have a specific definition. This is a nice example of how a seemingly simple question can lead to a discussion of wide-ranging topics that go off in different directions and end up tying many ideas together. That is part of the fun of being a Math Doctor.

Here is the question, from Danny in 2007:

What is the Meaning of "Average"? Can you please give a detailed description of average and its meaning? I'm not looking for a definition like "average is a certain # divided by a certain total #." I don't quite understand the real meaning of average.

A simple enough question, but one that could – and did – go in several directions, some of which were not initially apparent. In the end, we went beyond the definition of “average” in the sense of “arithmetic mean” as he defines it here, into other averages, and then into probability and beyond.

I started by discussing the various statistics that are sometime called “averages”, and then answered his specific question about the deeper meaning of the mean:

There are several different meanings of "average". The most general is a "measure of central tendency", meaning any statistic that in some sense represents a typical value from a data set. The mean, median, and mode are often identified as "averages" in this sense. The word "average" is also used (especially at elementary levels) to refer specifically to the mean, which is the kind of average you mentioned: add the numbers and divide by how many there are. This kind of average has a specific meaning: it is the number you could use in place of each of the values, and still have the same sum.

This is the sort of “average” that Danny initially referred to. I gave a brief example of what my definition meant, with links to fuller explanations of this sense of “average”, and of other sorts of “mean”:

What Does Average Mean? Arithmetic vs. Geometric Mean Applications of Arithmetic, Geometric, Harmonic, and Quadratic Means Average

My comments triggered a new question, about the meaning of “central tendency”. This is a complex term that is easily misunderstood. He did something wise here: he showed me what he was thinking, so I could correct it, rather than just ask another question without explanation. This is a part of good communication, and leads to profitable discussions.

Thanks for the helpful response. In your letter, you mentioned that average refers to central tendency. Let me give my interpretation of what central tendency means. Please correct me if I am wrong. For example, if our data shows that it rains 10 times over 100 days, then it means that the sky "tends" to rain 10 times per 100 days. 10 divided 100 gives a frequency value of 0.1, which means that it rains 0.1 time per day on average. This average refers to how frequently it rains. For example, if it rained 11 times (more frequent than 10), then you would get 11/100, which is a bigger value than 10/100. Thus, 11/100 is more frequent than 10/100. Is this interpretation of central tendency correct? I also think central tendency is the average value that tends to be close to MOST of the various values in the data. For instance, if my data set is (4,6,1,3,0,5,3,4) the central tendency is 3.25, which is a value that tends towards 3 and 4. There are two 3 values and two 4 values in the data, which make up most of the data set.

The word “tendency” had led Danny off in a new direction, verging on probability, which is not really what we mean by the term here; but it is connected to our topic (“it *tends* to rain 1 of 10 days, *on average*“), so this was not a huge stretch. But he seems to have missed the word “central”, which is the real key to the phrase.

There are several other little twists in Danny’s understanding here: Can it really rain 0.1 time in a day? (I didn’t use number of rainfalls, but inches of rain, in my examples. But we’ll get back to this question.) And does “central tendency” imply closeness to *most* of the data (which sounds more like the mode in particular)? I now had to dig deeper into what it *does* mean; my word “typical” above did not do much to clarify. Words are slippery, aren’t they?

What you are saying in both cases is a reasonable example of the mean, and fits with my description of average rainfall, though I used the inches of rain per day rather than the number of rainfalls. But central tendency is intended to be a much broader term. It's meant to be vague, because it covers not only means but also the median, the midrange, and even the mode. Its meaning is "any statistic that tends to fall in the middle of a set of numbers"; anything that gives a sense of what the "usual" or "typical" value is, in some sense, can be called a measure of central tendency. The *median* is, literally, the number in the middle--put the numbers in order, and take the middle number in the list, or the average of the two middle numbers if necessary. So that's clearly a "central tendency". The *midrange* is the exact middle of the range--the average, in fact, of the highest and lowest numbers. So that, too, has to lie in the middle, though it doesn't take into account how the rest of the numbers are distributed. The *mode* is the most common value, if there is one; it really doesn't have to be "in the middle", or even to exist, but it certainly fits the idea of "typical". The (arithmetic) *mean*, like all the others, has to lie within the range of the numbers, and it represents the "center of gravity" of all the numbers. So each of these fits the meaning of "measure of central tendency", each in a different way.

Now Danny turned a corner and moved fully into the topic of probability:

Hello doctor, thanks for your insights, I now have a better idea of average. Here is one more question about probability. Let's say that I was sick 40 times out of 1000 days. So based on this information, the probability of me getting sick on a random day is 40/1000. ... This leads me to conclude that the probability of anything is based on the past data, and we can make good predictions of future events because of the law of continuity, meaning that things in the universe always follow a pattern. If we lived in a universe without continuity, then the knowledge of probability is useless. So if I was sick 40 times out of 1000 days in the past, then the probability of me getting sick on a random day is the average value of 40/1000 = 0.04. I like to point out that the average 0.04 doesn't have a real physical meaning, because it says that I was sick on average 0.04 times per day (0.04 times? that makes no sense). I think 0.04 is just a number that corresponds to or represents 40/1000 (40 times per 1000 days is meaningful).

So he wants confirmation of his concept of probability, which I gave:

What you're talking about here is calledempiricalprobability: just a description of what actually happened, which can't say anything about why, or what could happen another time. It's simply a ratio: how does the number of occurrences of sickness compare to the number of days under consideration? Out of those 1000 days, 40 of them were sick days; so "on the average" 40 out of 1000, or 4 out of 100, or 1 out of 25 were sick days. If they were evenly distributed--the same idea as a mean--then every 25th day would have been a sick day.

I chose not to point out the difference between his wording (the number of times he got sick), and mine (the number of sick days); we often leave such details to be absorbed from what we say rather than deliberately confront them; but we try to give an example of careful wording.

But does the concept of probability imply, or require, that the universe must follow predictable patterns? That raises some further questions about its validity, which I answered by dipping into the philosophical realm:

Now you've made some big jumps! Not ALL of probability is just about past data; that's just empirical probability. And we can't always extrapolate from past events to the future. Sometimes that works, sometimes it doesn't. In part, it's the job ofstatisticsto look at the data you've got and determine how valid it is to expect the same probabilities to continue--how good a sample you have. But even beyond that, whether we can assume that patterns will continue depends on other knowledge entirely, such asscience. If we find a mechanism that explains a pattern, we have much better grounds for expecting it to continue than if we don't. To make a broad statement that "things in the universe ALWAYS follow a pattern" is to indulge inphilosophy, not math. In probability, we go the other way: we make an ASSUMPTION that things will continue as they are, in order to be able to apply probability to predicting anything; we leave it up to scientists (or sometimes philosophers) to decide whether that is a valid assumption. The scientist will most likely do some experiments to see if the predictions based on his theory work out, and if so he has some evidence that it is valid, and he can continue to make predictions. If not, then he tries another theory! He certainly would not say that probability forces him to believe that things work a certain way. And perhaps that's what you mean to say: probability applies to a situation beyond the data we have only if there is consistency in the causes underlying the phenomena.

As we have said many times, math is not necessarily about the real world; it can be used to model the world based on observations of it, but the results must always be checked. Probability assumes a consistent world, saying, “If things continue the same, this is what we can expect.”

After pointing out the relationship of his comment about 0.04 times per day and the Law of Large Numbers, I returned to the connection of these ideas to averages:

The difference between this and the general idea of averages is that an average can apply to any collection of numbers, not just to the frequency of an occurrence. We can talk about the average speed of a car; regardless of how its speed has varied along a route, we can use the total distance traveled and the total time it took to determine the average speed, which is the speed it might have been going throughout the entire trip, in order to get the same total distance in the same total time. There is nothing probabilistic about this; but like probability, we are taking something that may vary "randomly" and condensing all its variations into a single number. The average speed does not mean that at every moment the car was going that fast, and the probability does not mean that out of every 25 days you are sick on one of them, or, worse, that on every day you are sick for 1/25 of the time. Averages and probability both ignore unevenness and look only at the big picture. And that makes your question a very good one. I've been noticing the connections between probability and averages in several areas lately, and it's good to have a chance to think more about it.

Finally, Danny got back to the topic of averages, with an excellent long question about things like this:

It seems like sometimes averages have no meaning. For instance, in a class of 10 students, 2 got 100 on a test, 8 got 0. The test average is 200/10 = 20. So on average every person got a 20 on the test. If I am correct in thinking that an average value is an estimate of the various values in the same data set (like you said, an average is like a center of gravity in the data set, so all the numbers in the data set should lean towards the average), then the average 20 is closer to the REAL scores of the 8 students who got 0 than to the REAL score of the 2 people who got 100. This average gives a vague idea of how badly most people did, but it has "hidden" the two perfect scores. The average may tell us that most of the people must have done badly so that the average comes out to be so low. However, we can't know that some people did perfectly just by looking at the average.

Note that he very nicely paraphrased “central tendency” as “all the numbers in the data set should lean towards the average” – I think he got it!

Here are some excerpts of my discussion of this new topic:

Several of the pages on our site that discuss mean, median, and mode talk about why you would choose one rather than another. Each has its uses, and what you're saying is that for some purposes the mean is not the appropriate "measure of central tendency". That doesn't mean that it is meaningless, or that it is never a valid concept; only that it doesn't tell you what you'd like to know in this situation. ... Another classic example of this is median income. If in your town 999 people earned $1000 a year, and one man earned $9,000,000 a year, the average (mean) income would be 10,000 a year, even though NOBODY made that amount. [Oops - I meant 9,999.] The median income gives a much better picture, if you want to know how the "average" person is doing; but that entirely misses the fact that there is one person who is rich. No matter what "average" you use, you'll be leaving someone out. Another example is the rainfall I like to use to illustrate the idea of the mean. If the average rainfall is 1 inch a day, say, it might actually have been dry as a bone for 99 days, and then there was a 100 inch flood on the last day. The average accurately reflects the TOTAL amount of rain over the 100 days, but that isn't all it takes to decide what plants can survive there. Again, the whole idea of an average is to try to boil down a lot of information into one number. That necessarily means that you have to lose some information. (That's why people don't want to be treated as mere numbers; they are more complex than that. Even a set of numbers doesn't like to be replaced by a single number!) ... Incidentally, I've sometimes noticed in teaching, as a result of these statistics, that I can't "teach to the middle" of the class, because there is no middle. Sometimes I find a bimodal distribution, which means that I have a lot of F's and a lot of B's, and no one in between where the median and the mean both lie. (The last word there is an interesting, and very appropriate, pun!) So I have to ignore the statistics and teach the students.

As always, I’ve left out a lot, so you’ll have to read the original if you are interested. Some of these long discussions bring out a lot that is worth pondering.

]]>The first thing to tell a student is that rounding to the nearest whole number (or nearest tenth, or nearest hundred, or whatever) means just what it says: making a **round** (approximate) number that is the **nearest** of its type to the given number. From that definition, we move on to procedures for finding that number efficiently, which is a separate issue. So many students seem to learn only the rules, and not the meaning, so that when they mix up the rule, they can end up with a nonsensical number — one that is not at all near the target, or that is not the type of number they are aiming for, for example.

Here is a question from Keith in 2006 where I got into the main issues:

Different Techniques for Rounding Off Decimal Numbers I am a little split on the different ideas of rounding decimals. I understand that in basic math they tell you to take the decimal place just to the right of what you are going to round to (ex: 1.45 to the tenth is 1.5). However what I was taught in advanced mathematics is when you are taking a number that has multiple numbers beyond the decimal place that you should take the complete number in account when rounding. (ex: 1.4457 would round to 1.5 or 2.0). I don't know if it is just the way different teachers teach or maybe it is a different form of math that involves this kind of rounding. Could you help clarify?

There were several different things he might be confused about, so I chose to give him an overview of what rounding is, starting with the definition:

First, the essential idea in rounding is stated in the words we use when we state a problem fully: Round 1.45 to the nearest tenth. This means Find the multiple of 0.1 that is closest to the number 1.45. If we just do exactly what this says (which does, indeed, take the entire number into account), then we can't help getting the right answer--if there really is one! The two nearest multiples of 0.1 are the tenth above and the tenth below, namely 1.4 and 1.5. If our number had been, say, 1.445, we could find how far it is from 1.4 and from 1.5, and choose 1.4 because that is, indeed, closer. In the case we are considering, however, we find that both numbers are exactly the same distance from 1.45, so there really isNO correct answer. So if we want to have one "correct" answer, we have toarbitrarily choose one of the two. This is where trouble comes in: we have a choice, and we might make different choices depending on our concerns.

What we see here is that, first, using the *definition* directly is a little laborious, so we need a simple *procedure*; but, second, there are cases (when a number is exactly halfway between two nearest numbers) when the definition doesn’t give us an answer, and we have to make an *arbitrary choice* (unless we chose to say the number couldn’t be rounded!). This means that the simple procedure we invent may not be the same as someone else might prefer; it depends on our criteria for the arbitrary choice.

In teaching children, and in using rounding for simple purposes such as estimation, we want thesimplest possible method. A reasonable way to do it is this: Any number between 1.4 and 1.45 will round down; any number between 1.45 and 1.5 will round up. The first group all have the NEXT digit less than 5 (0, 1, 2, 3, or 4); the second group all have their next digit five or more (5, 6, 7, 8, or 9). If we arbitrarily choose to round 1.45 up, then the rule becomes very simple: ANY number with a 5 in the next digit will round up. This rule givesthe correct answerwhenever there is one correct answer, and givesa valid answerwhen there are two.

This, in my experience, is the usual elementary method taught in America, and probably in many other places. We have often been asked why we “always round up on 5”, and the reason is just that this produces the simplest possible rule to teach — not that it is the only correct thing to do. Note that the *definition* “takes the complete number into account”, in Keith’s words; but this *procedure *only *needs* to look at the next digit in order to accomplish the same result.

There are times when we have reason to do something other than what is simplest:

In some settings, we care about the statistical properties of our rounding; we don't want to skew averages by always rounding up in this odd situation, so we'd like to round up half the time and round down half the time. One common solution is to always round such "exactly between" numbers so that the last digit is EVEN. Then our 1.45 will round to 1.4 rather than to 1.5. Note that this rule is identical to the other in all cases except exact halfway numbers like 1.45; both methods give the same answer when there is only one valid answer. It makes a different choice in the special case. And it "looks at the entire number" only in the sense that it checks whether there are any other digits following a 5. If there are, then we follow the basic rule; if not, we round to an even number. But there is nothing else beyond the next digit that matters. Even this method can lead to biases (for example, it will lead to too many even answers!); so if that matters, you might need to just randomly decide whether to round halfway numbers up or down.

So this method is just a trick that can help in some special statistical situations; it is not a cure-all. And it is not, as some people tell us, the only right way because it is what they were taught.

Here is an example, from 2001, where Debbie asked about the “round-to-even” rule, and turned out to have missed the most important detail:

Rounding Decimals: Even/Odd Issues This is a strange question... I remember learning a way to round certain numbers, but now I think I am going crazy! Please help - here is what I remember: When an even decimal (or any even number) is followed by a 5, you round down. When an odd decimal is followed by a 5, you round up. For example: 75.45 = 75.4, but 75.55=75.6. Or rounding to the nearest 10: 145 = 140, but 155 = 160. Can you let me know if I learned the "rule" incorrectly? It seemed from the info on your page, I did learn it incorrectly; but it has persisted in my memory! As you've said on your page, statistically we should "round down the first 5 (1-4) and round up the last 5 (5-9)."

She had seen what we had written to students who asked about the simple method (whom we hadn’t told that there was any other way). I explained the round-to-nearest-even rule, as I did above, and also referred to a previous answer we had given in 1999 to a similar question to hers (Rounding Up or Down on a 5).

Debbie wrote back:

Thanks very much! Now I know I've not gone completely nuts! But the caveat you suggest is one I do overlook, and should be considered ("for example, you wouldn't round 75.451 down to 75.4, because there is another digit beyond the 5, and 75.451 is closer to 75.5 than to 75.4."). And thanks for replying so quickly. I've bookmarked your page - even though I teach college students, sometimes they need to be reminded of stuff they already learned!

The fact that this detail is easily overlooked is one reason we usually teach only the simple method to kids: there’s less to pay attention to. (And note that teachers need to be reminded, too. That includes me.)

Now let’s look at a different issue involving what digits we should look at:

Rounding 3.445 to the Tenths Place In my daughter's 6th grade math class, they are told to address the digit to the right of the place being rounded. For example, 3.445 rounded to the tenths place, would be 3.4, since the number to the right of the tenth place is less than 5. However, doesn't the presence of the 5 in the one-thousandths place round the 4 hundredths to 5 hundredths, which in turn would round the 4 tenths to 5 tenths?

Doctor Rick and I gave supplementary answers to this. He explained the reason for the method (in a little more detail than I did above, if you want to see it done right), showing why there is no *need* to look beyond the next digit. I went one step further, and emphasized it is actually *wrong* to look beyond it:

The problem here is thatyou have to round all at once, not one digit at a time. Rounding twice, to different digits, doesn't do what you would think it would. Here's what happens: Since 3.445 is closer to 3.4 than to 3.5, it must round to 3.4; the border between 3.4 and 3.5 is at 3.45, and 3.445 is below that. But if you first round it to the nearest hundredth, it becomes 3.45, moving it from "below the border" to "right on the border" and allowing a second rounding to move it "over the border" to 3.5. It's as if the border patrol were to decree that anyone within ten feet of the boundary fence should be considered to be on the fence; and then said that anyone on the fence should be arrested for illegal entry. That wouldn't be right, since people ten feet outside of the country would be treated as if they were inside!

In case it isn’t clear what I am saying, here is the picture I had in mind:

3.445 : \/ : +----+----+----+----+----+----+----+----+----+----+ 3.4 3.41 3.42 3.43 3.44 3.45 3.46 3.47 3.48 3.49 3.5

I have marked 3.445, which clearly is to the left of the border between 3.4 territory and 3.5 territory. The nearest tenth is clearly 3.4. But if we first rounded it to the nearest hundredth, 3.45, well, that is the boundary for the nearest tenth, and by our standard rule, when we now round to the nearest tenth, anything on the boundary rounds up to 3.5.

Let’s look at one more issue. Another teacher, Susan, in 2000, asked this:

Rounding Down to Nothing In my third grade class, we are working on rounding numbers. A question came up about rounding the numbers 0, 1, 2, 3, and 4 to the nearest 10. The nearest 10 would be zero, but that seems to say that you would be "rounding" down to nothing. That seems inaccurate, since as you do have "some." Would you round down to 1? Would you consider negative numbers here? We can't seem to find information on this subject.

It just felt wrong to them: How could rounding change something to nothing? Well, this is where a central feature of math, making definitions and sticking to them, comes to the rescue. Even if it feels wrong, we have to follow the rules. But it helps when we know why the rule makes sense:

The problem with rounding small numbers down to zero is not that the rounding itself is wrong, but that one would not ordinarily want to do it. As you say, rounding loses a lot of accuracy - in fact, it loses all the information you have. For precisely that reason, we would rarely round a number that way. For example, suppose you measured the height of everyone in your class, and got numbers like, say, 1.234 meters. If I asked you to round them to the nearest ten meters, you'd probably question my choice, since they would all round to zero. Even if we round to the nearest meter, we'll lose all our information, since all the numbers will be 1. Instead, we would probably choose to round to the nearest centimeter, in order to avoid losing data. But if you were measuring the heights of mountains, with some numbers in the kilometers and others (say, in Delaware) only a few meters, then rounding to the nearest ten meters would make sense, even if some "mountains" (sand dunes?) rounded to 0. You would still have useful information; the zero would tell you a lot about the height compared to real mountains. ... If the result is zero, it's not the answer that's wrong, but the question.

Context is everything. You choose to round in a certain way for a reason; if you round to the nearest ten meters, then it must be because not everything you are measuring will round to zero. But if this means your height rounds to zero, you do it, because any other answer would not really be the nearest ten meters! In math, we don’t lie just to avoid hurting a number’s feelings; truth matters.

]]>

One of the benefits of being a Math Doctor is interacting with the math of many cultures around the world, as we attract an international following. We have observed variations in terminology and notation from country to country, as well as variations in the content of math education (some better than what we know, some worse, some just different). It has become clear that the way *we* know math is not what *everyone* learns. Math may be universal, but the way it is *expressed* is not. (In what I write here, I will be reflecting my own perspective, having learned math in the U.S.; Math Doctors from Europe or other places may find what I call unfamiliar, totally normal!)

One thing that has particularly impressed me from some recent questions is the wonderful notation that is evidently used in Turkey (and presumably elsewhere) for lines in geometry, which I had never seen (or noticed, at least) until now, and like very much. It is so clear and simple, and so consistent with other fields of math, that I can’t conceive of a better way. But I have not been able to find any information about it in order to be sure where it originated, how widespread it is, or why it is not better known in the English-speaking world.

[After writing this, I found a couple references indicating that it is used in Germany, and a similar notation is used in France. But why is it so hard to find any reference to it in English-language materials?]

We have received many questions from one Turkish student in particular recently. The problems themselves are often very interesting in themselves, but rather than focus on our discussion with him about any one of these, I just want to look at a couple examples from a practice test he showed us.

Here are two problems, which I chose just because of the notation they demonstrate:

Problem 62:

Şekilde ABCD deltoidinin alanı 160 birimkaredir.

Buna göre, ABCD deltoidinin çevresi kaç birimdir?A) \(20 \sqrt{5}\) B) \(24 \sqrt{5}\) C) \(28 \sqrt{5}\) D) \(30 \sqrt{5}\) E) \(32 \sqrt{5}\)

Google translation:

ABCD deltoid

\([AC] \perp [BC]\)\(|AB| = |BC|\)

\(|AD| = |DC|\)

\(|BE| = 4|ED|\)

\(|AC| = 16\) unitsThe area of the ABCD deltoid is 160 units.

How many units is the circumference of the ABCD deltoid?A) \(20 \sqrt{5}\) B) \(24 \sqrt{5}\) C) \(28 \sqrt{5}\) D) \(30 \sqrt{5}\) E) \(32 \sqrt{5}\)

In other words, in kite ABCD, segment AC is perpendicular to segment BC; the lengths of AB and BC, and of AD and DC are equal; and the length of segment BE is 4 times the length of ED. The length of segment AC is 16 units. If the area of the kite is 160 square units, what is the perimeter of the kite?

We see here that **[XY]**, the notation we use in algebra for a closed interval, is used for a **segment** (where I am used to \(\overline{XY}\)). This is perfect: it means the set of all points between X and Y, including the endpoints. Similarly, **|XY|**, the notation we use for the absolute value, or distance between points on a number line, is reused for the **length **of a segment (distance between points in space), where I am used to \(m\left(\overline{XY}\right)\), though many texts just use \(XY\).

[We might quibble about the use of “units” rather than “square units” for the area, but perhaps “unit” here isn’t meant as a stand-in for a specific unit of length as in English math, but just refers to whatever unit (of length or area) is appropriate. Who am I to criticize someone else’s use of their own language? I have learned as a Math Doctor not to judge what may just be a difference of culture.]

Here is another:

Problem 64:

Yukarıdaki şekilde [AC] ve [BE], O merkezli çemberin çaplarıdır.

Buna göre, x kaç derecedir?

Google translation (probably could be improved):

O centered circle

\(AC // ED\)

\(AC ∩ FD = {O}\)

\(m(\widehat{BOC}) = 60°\)

\(m(\widehat{DFB}) = x\)In the above [AC] and [BE], the center circle diameters.

According to this, how many are x?A) 75 B) 80 C) 90 D) 105 E) 120

That is, we are given a circle with center at O, and know that line AC is parallel to line ED; line AC intersects FD at point O, and the measure of angle BOC is 60°; what is the measure of angle DFB?

Here, we see that **XY** without brackets represents a **line** (not restricted to points between X and Y), instead of requiring the difficult typography of \(\overleftrightarrow{XY}\); and the **intersection** of two lines is indicated by the usual symbol for the intersection of two sets. The notation for the measure of an angle is similar to that used elsewhere, and probably could be made more consistent.

[My searching suggests that in France, (XY) is used for a line, and XY for the length of a segment. This is a little less consistent, since as an interval, (x, y) does not continue past x and y; but it is nice in a way, as it suggests a lack of endpoints. I might perhaps suggest using <XY> for a line.]

We should take a moment to look at the problems themselves. No difficult geometry is required for these particular problems (some are much harder), but they are not routine, either. They require thinking in several steps, forcing you to find a path through the problem. Any questions? (If I get comments asking about them, we can discuss them!)

Okay, I’ll relent and show you one of the problems Sinan has asked us about, from a practice book for the test he showed us. Here is the question (which came with a sketch, but there is enough information here to make it yourself):

ABC is a triangle.

D is element of [AC]

|AB| = |AD|

|BD| = 24 units

|DC| = 7 unitsWhat is the minimum value of perimeter of ABC as an integer?

My work:

ABC = 2a + 7 + |BC|

2a + 7 > 31

BC > 25Perimeter of ABC > 56

But Book says 63.

He implicitly defined *a* = |AB|, and used the triangle inequality. Here is my answer:

When your answer is that the minimum is less than the claimed actual minimum, that tells you that you have not taken into account all conditions; adding another condition will tighten the “feasible interval”, making your assumed minimum unattainable.

What you have missed here is that 2a+7 does not have its minimum at the same time that BC has its minimum; one increases (toward infinity) while the other decreases (toward 25). So you can’t just combine the two inequalities.

I don’t have a formal solution, but just imagined that we adjust the length a = AB = AD while keeping BD fixed. When a is large, A is far from BD and DC approaches perpendicularity to BD, making BC smaller. When a is small, A is close to BD, and DC approaches collinearity with BD. Taking the latter to the extreme, A and C would lie on BD extended, making the perimeter equal 12 + 12 + 7 + 31 = 62. That is not a valid triangle, so the smallest integer value is 63.

This reasoning is not a proof; it is conceivable that as a increases, CD might decrease faster than 2a increases, so that there would be a smallest perimeter other than at the extreme. But a little more thinking convinces me that this would not happen; the rate of decrease of CD will be near zero at first.

One thing interesting here (and in several other problems Sinan sent us) is the restriction to integer values (asking either for the greatest or least, or the number of integer values possible). This makes it possible to have a multiple-choice problem that is hard to solve by working backward from the answers, since the integer answer may not in itself be significant. Also, at least the way I tend to solve them, it involves at least as much a *feel* for how geometry works, as it does formalism or proof.

Sometimes, we get a question that seems familiar, and can look back at past questions to see if we have already answered it, or to get ideas about what it means. This is the story of a problem that seems to float around as folklore, in varied forms. At first it seemed unanswerable, because each version seemed to lack information; but with time we figured out how to make sense of it. Its meaning may depend on knowledge of the cultural context in which it was first asked, so that in other cultures it can become an entirely different kind of puzzle.

Here is how I first met the problem, as sent by Mark in 2002 (bold indicates features I’ll be pointing out later):

The Case of the 90 Apples A father takes his 3 sons to a small town nearby. He places the oldest son at the first corner, the second son at the second corner, and the youngest son at the third corner. He then gives the oldest son a basket of 50 apples, the second son a basket of 30 apples, and the youngest son a basket of 10 apples. He tells them that the oldest son has the right to put upa sign that states how much each apple costs, or how much as many apples as he prefers can be sold for. The other sons have to follow that sign and sell that same amount for the same price. The sons cannot exchange or trade apples. They still have to come up with the same amount of money at the end of the day when the father comes to collect. They each have to sell their full stock of apples. The first has to sell 50, the second 30, and the third 10, and they all have to give the father the same amount of money without sharing among themselves. What throws me off is that the 50 apples can cost the same number of dollars. Please help.

Clearly, it involves some sort of trick. I remembered seeing similar problems, so I searched, hoping for clarification. Just a month earlier, Kumar had asked this less detailed version, which went unanswered:

A Father has three children. He gave each of them some mangoes as follows: He gave 50 mangoes to his first son, He gave 30 mangoes to his second son, and lastly He gave 10 mangoes to his third son. Theyhave to sell the mangoes with same priceand they should bring home same amount of money. For example,if first son sells 5 mangoes for $10.00, second and third sons also have to sell 5 mangoes for $10.00. However, they should bring the equal amount of money when they come back home. None of them are sharing mangoes with each other. How is this possible?

The previous year, Cam had sent the following version, which sounds more ancient, but was his Problem of the Week:

There once lived in Damascus a peasant who bragged to a 'qadi', a judge, that his daughters were not only very intelligent, but blessed with rare skills of the imagination. The qadi had the three girls brought before him. Then he said to them, "Here are 90 apples for you to sell in the market. Fatima the oldest, you will take 50; Cunda you will take 30; and Shia, the youngest, you will take 10.If Fatima sells her apples at a price of 7 to the dinar, you other two will have to sell yours at the same price. And if Fatima sells her apples for 3 dinar each, you two will have to do the same.But no matter what you do, each of you must end up with the same amount of money from your different numbers of apples." "But can I not give away some of the apples that I have?" asked Fatima. "Under no circumstances." said the qadi. "These are the terms: Fatima must sell 50 apples. Cunda must sell 30 apples. And Shia must sell the 10 apples that remain. And all of you must earn exactly the same profit in the end." The three sisters need directions to resolve the problem of the 90 apples before they get to the market.

Doctor Budrow answered this one, suggesting they sold apples to one another; I suggested that the only possible answer was that they all gave away all their apples (not just “some”). These sorts of trick answers are not satisfying; but I supported my idea by pointing out that zero was discovered/invented in a setting that matches the story.

In 1997 Benno had sent us the following version, which he said came from a book, *The Man Who Counted: A Collection of Mathematical Adventures*, by a Brazilian math teacher writing as “Malba Tahan”, which is presumably the original source:

The case of the 90 apples. To prove that three rustic daughters have great intelligence, they must solve a problem proposed by a jealous Cadi. The Cadi gives 90 apples to the girls and says: These apples you must sell in the market. Fatima, the eldest, takes 50, Cunda 30, and Siha, the youngest, the remainder (10).If Fatima sells 7 apples for 1 dinar (the Arabian money at that time) the others will sell the apples at the same price (7 for 1 dinar). If Fatima sells 1 apple for 3 dinars this is the price at which the others must sell their apples.The whole transaction must be accomplished in such a way that the three girls get the same amount of money.

(I later found the English translation of the book online; it turns out that Cam’s version above is a more or less direct quotation from the book, while Benno’s is a paraphrase with different spelling, possibly because it was translated from the original version?)

Notice that the details of the rules vary, but all of them have a feature that I missed: they leave open the possibility that the prices could be changed.

A version sent by Scott in 2000, however, doesn’t seem to allow for changing prices:

Farmer A has 10 eggs. He takes them to market, andwrites a price on the board. He sells all of his eggs and goes home. Farmer B has 30 eggs. He takes them to the same market. He sees the price written by farmer A anddecides to use that price. He sells all of his eggs and goes home. Farmer C has 50 eggs,uses the same price, sells the eggs, andgoes home. All three farmers make the same amount of money. What was the selling price of the eggs?

The same is true of this very simplified version from Greg in 2002 , which has no clues as to the trick:

Three Vendors are to sell their apple at the same price. Vendor (A) has 10 Apples, Vendor (B) has 30 Apples, and Vendor (C) has 50 Apples.All apples must be sold at the same price.All vendors after selling their apples will end up with the same amount of monies.

In 2004, Paul, having read the discussion above, wrote to tell me that allowing prices to change from time to time made a solution possible:

The way the question was stated to me was thus: Farmer A has 50 apples Farmer B has 30 apples Farmer C has 10 applesFarmer A sets the price and they all sell their apples for that price. He may change the set price at any time.They all sell all their apples and go home with the same amount of money. The solution is: Farmer A sets the price at 7 apples/$1. Farmer A sells 49 (makes $7), B sells 28 (makes $4) and C sells 7 (makes $1). Then farmer A changes the price to $3/apple, and they all sell what they have left. They all end up with $10. I believe this is the correct solution to the problem. The version you suggest "appears to be the original" allows for varying prices and even mentions the prices in my solution.

Finally we have a more or less satisfying answer; some of the versions we’ve seen seem to disallow this, so I have to suspect that their authors, if they claimed to have an answer, probably assumed it was the sort of trick I had guessed.

But we still have the question, how would you solve this? I discovered that if the mere possibility of prices varying from time to time is the trick, then there are many solutions. I showed that there are two sets of amounts that could be sold at the prices specified in Paul’s version, and that there are other pairs of prices that work; for example, if they are first sold at $1 per apple (selling 49, 25, and 1 apple, respectively), and then at $6 per apple (selling 1, 5, and 9), then everyone earns $55.

A problem with no genuine solution, and one with many possible solutions, are both disappointing to a mathematician (unless we were asked for *all* possible solutions and the rules were clear – that would be a nice challenge).

But there’s more! In 2013, another Paul wrote, suggesting a slightly different interpretation: that prices don’t change, but rather are sold at a combination of prices, like $2 per dozen and $1 each for individual apples.

My answer: $2 per dozen and $1 ea for the remainder. 50 = 4 doz at $2 a doz ( $8 ) & the remainder at $1 ea ( $2 ) = $10 30 = 2 doz at $2 a doz ( $4 ) & the remainder at $1 ea ( $6 ) = $10 10 = 0 doz ( $0 ) & the remainder at $1 ea ( $10 ) = $10

This doesn’t fit most versions, but does work in some that don’t allow for changing prices – including, if you read it right, the version I started with: “a sign that states how much each apple costs, or how much as many apples as he prefers can be sold for”.

I then found that Paul’s solution was in fact one of those I had found for the varying price model; and interestingly, his was based on dozens, while Paul 2004’s solution was based on sevens, which probably was the cultural equivalent of dozens in the original problem. This suggests that people in different cultures might make their own assumptions about how fruit would be sold, and *that assumption* provides the missing key to the problem. I proceeded to try out different possible assumptions and found that not all package sizes yield solutions, but more than one do.

And still there’s more! In 2016 Iqbal sent us this similar problem:

The Case of the 90 Apples, Redux Four men have 70, 80, 90, and 100 oranges, respectively. Each got $20 selling all the oranges he hadusing the same logical sentence.

He wrote back 7 months later to show the solution he had found:

I am contacting you after a long time to announce that my puzzle has been solved. | |Price| | |Price| | Oranges|Dozens| per |Subtotal|Remainder| per |Subtotal|Grand | |Dozen|(Dozens)| |Piece|(Pieces)|Total ------------------------------------------------------------ 70 5 2 10 10 1 10 20 80 6 2 12 8 1 8 20 90 7 2 14 6 1 6 20 100 8 2 16 4 1 4 20

I recognized then that his interpretation of the cryptic problem makes it equivalent to the *qadi* problem. In response, I referred him to *The Case of the 90 Apples*, and commented,

I did not figure out how the question was meant to be taken because neither Mark nor Paul stated the problem clearly (especially in some of its versions). Even after subsequent submissions, I'm not sure I fully understood how the problem was supposed to be interpreted. My response was to try to analytically find all possible solutions within a general class of rules, because it was clear to me that the solution given was not unique, but appeared to be the result of guessing, perhaps based on cultural assumptions that fruit would be sold in sevens or in dozens. In fact, I showed several solutions using different groupings. Your version was extremely unclear (because of the vague term "logical sentence"), but your answer has the same form. A clearer restatement might read like so: Four men have 70, 80, 90, and 100 oranges respectively.They sold all their oranges using the same pricing scheme, with set prices for different size packages of oranges,and got 20 dollars each. What was that scheme? The scheme has the form, "$X for each ___, and $Y for each additional orange," and your solution is that if the scheme was "$2 for each dozen, and $1 for each additional orange," then each of the four would make the required $20.

I found a way to efficiently arrive at Iqbal’s solution, and that because his version is more restrictive than the others, it may have only one solution. He then created a couple more similar problems by modifying the numbers. (That’s the sign of a mathematician: not stopping with a solution, but extending the problem!)

This last version, in my formulation, cleans up both the meaning of the problem, and the method of solution. Once you know what the problem means, it’s actually not that hard.

I had always wondered why no one ever quoted the answer from the various sources where they got the problem; no one ever told us they had an authoritative answer, even the one who referred to the book. I finally was able to read the next page of the book (which was hidden when I first located it), and found that it *does* give the answer:

The solution to the problem with which the qadi of Damascus tormented the three sisters is the following: "Fatima starts selling her apples at a price of 7 apples for 1 dinar. She sells 49 of her apples at this price, but keeps back 1. "Cunda sells 28 of her apples at this price, but keeps back 2. "Shia sells 7 of her apples at this price, but keeps back 3. "Then Fatima sells her 1 remaining apple for 3 dinars. In accordance with the rules of the qadi, Cunda then sells her 2 remaining apples for 3 dinars each. And Cunda then sells her 3 remaining apples for 3 dinars each. "Thus: Fatima First phase: 49 apples bring a profit of 7 dinars Second phase: 1 apple brings a profit of 3 dinars ------------------------------------ Total: 50 apples bring a profit of 10 dinars Cunda First phase: 28 apples bring a profit of 4 dinars Second phase: 2 apple brings a profit of 6 dinars ------------------------------------ Total: 30 apples bring a profit of 10 dinars Fatima First phase: 7 apples bring a profit of 1 dinar Second phase: 3 apple brings a profit of 9 dinars ------------------------------------ Total: 10 apples bring a profit of 10 dinars "Therefore, each made a profit of 10 dinars, and thus the problem set by the envious qadi of Damascus was solved."

So, yes, the key is to use the stated prices sequentially; and, no, no explanation is given of how to solve it. But it would seem that we are to take the meaning as, “Using the two prices I suggested, find appropriate quantities so that they each get the same amount of money.” If it had been stated that way, it would have been still a challenge, but less frustrating.

]]>The first is by Doctor Ian in 2002:

Avoiding Careless Mistakes How to avoid careless mistakes? I have tried do as many problems as possible, but mistakes are constantly made just because of carelessness!

Doctor Ian makes three main points, each illustrated by detailed examples that I will omit:

Once I became convinced of the importance of what he calls 'the habit of correctness and precision', I found that I started adopting it quite naturally, without much effort at all.

Write more steps than you think you need to, so that you can see what you’ve done. (My personal version of this is: Until you *write* something, you can’t tell whether it’s *wrong*; writing clarifies what you are thinking.)

Having said that, it seems to me that many, if not most, of the careless mistakes that we see here at Ask Dr. Math are caused by trying to do too many steps at once. ... A good rule to keep in mind that you can't make mistakes fast enough to get a correct answer. :^D

(That is, make sure it really does what you were asked to do.)

I also make it a point _always_ to check my answer, after I think I've found it. At first this is something you have to remember to do, but after a while it becomes natural.

In addition, doing a check with actual numbers sometimes clarifies what you should have done with variables.

A second kind of careless error, which sort of falls into the same category, is caused by translating story problems too quickly into equations. ...

The less you write, the more easily you can make a mistake, but the more you write, the more places there are to make a mistake! (I just said this to a student yesterday!)

On the one hand, working 'in place' is an easy way to get sucked into making careless mistakes; but on the other hand, the more times you copy something, the greater the error that you're going to copy it incorrectly. With a computer, you can simply copy the old line and change it, which is what I've done here. Without a computer, you have to use your judgment about whether working in place or copying is more likely to cause a problem. To make that judgment, you have to have some idea about the kinds of mistakes that you tend to make, and how often you make them.

Which leads to my final recommendation, which is that you might want to keep a notebook of the careless mistakes you make. Keeping track of them would allow you to observe patterns, and figuring out what you're doing is the first step towards changing _any_ kind of behavior.

The second answer was written by Doctor Rick to a different student just a few days later:

Careless Mistakes No matter how hard I try it's not good enough. I understand the material - it's the other stuff, like when it asked me for the range, domain, and inverse of ordered pairs, I just put the inverse because I misread the question. Or like when I had the right answer on scrap paper but I left off part of the answer when I wrote it on the answer sheet. It makes me feel really stupid. I've had this problem with careless mistakes since 6th grade, but it's getting worse. Proofreading my work helps very little and sometimes I don't have time when I'm done with the test. What do I do?

Doctor Rick has two main suggestions for this student, who clearly is a diligent student with good observations of his own.

(I ask my own students to rework any problems they get wrong on a test and turn it in for partial extra credit; you should do this yourself even if there is no external reward.)

You have observed some specific kinds of mistakes you make, and that's a great way to start. One step in problem solving that many people forget about - even after checking your work, which is easy enough to forget - is to look back on what you've done and see what you can learn from it. Sometimes you see something positive that you'll be able to use again - a trick that worked, or a pattern you saw ("when I see this, I can try that"). Other times, as in your case, you see something to avoid next time. The question is, how can you avoid these sorts of mistakes?

(I commonly list the “givens” and the “goals”, with blanks next to the latter; I may also underline these things in the original problem.)

You say that you misread a question, so you didn't give all the answers that you should have. This is a reminder that another important step in problem solving - the first, and sometimes the most important - is to ask, "What am I supposed to find?" Try making a ritual of starting a problem by listing exactly what you are supposed to find. Then when you finish your work, write each answer next to the list, or at least check off your list as you copy the answers. This will also solve your other observation: that you forgot to copy all the answer from your scratch paper.

Finally, in 2016, Doctor Floor gave a very helpful answer to a long and thoughtful question:

Coping with Carelessness: Strategies, Stresses, and Mindsets Our son, a high school junior, is currently taking Advanced Placement BC Calculus. He has always excelled in math (and all other subjects), and never had to study much, because he easily understands concepts. He has, however, always had a tendency of making careless mistakes. This year this tendency has become a particular problem, with his grades suffering for the first time. Part of his tests are multiple choice -- no calculator allowed. Here, he does not have to show any work. But this part needs to be turned in prior to starting on the next section, one where calculators are allowed. On the calculator section, he does need to show his work; and the brevity of the short answers often belies the many intermediary steps they required. ... My son's teacher says all his mistakes have been careless ones, not conceptual ones: he makes simple calculation errors, or misreads questions, or omits units, or runs out of time, depriving him the opportunity to check his work.

Doctor Floor lists three observations:

Smart kids often have learning strategies that seem lazy or careless, due to the fact that they haven't been really challenged in younger years. Because of this lack of challenge, there has never been any motivation to develop learning strategies or solving strategies. Easy tasks pave the way for good grades, and the cycle reinforces itself. But at some point in a school career, relying on talent alone turns out to be not enough. Even worse, teachers often think that by high school, high-performing kids must have good strategies. ... So the key is in his homework. Do not only complete homework, but *review it.* Learn from your mistakes. Even when you do it correctly, wonder if you could have done it smarter, or quicker. If you encounter a trick or novel thought, make a note. Be concentrated and targeted in your review.

People cope with this very differently, so consider these as only the most general of observations: - Be confident. If you know you are well-prepared, there is no need to be stressed. ... - At the same time, be realistic about what to expect. This is particularly important if the test turns out to be more difficult than you thought, or time pressure is higher than you thought. - Force yourself not to think of any consequences while taking the test. Just take your test, and stick with taking the test. Other thoughts will only break your concentration. Prepare ahead of time so that if you do lose concentration, you already have a way to re-focus and get back on track. ... - If you can show your abilities in a test, that is the best you can do.

Quite a few kids have a mindset that holds them back. It is called fixed mindset, and comes with the thought, "It doesn't really matter if I do the homework or not; either I will understand it or I won't." Often kids who are labeled as "smart" or "intelligent" develop such a mindset. These kids think their intelligence is fixed, learning is understanding, and that developing intelligence does not exist (e.g., by homework). By contrast, a "growth mindset" posits that developing intelligence is possible. Research shows that there is indeed a correlation between mindset and intellectual development.

The parent responded:

We can't thank you enough for your response! It was spot-on! I feel all your points are excellent and will be very useful. I can't wait to share your response with our son (he came home with the flu yesterday), because I think he will now understand the underlying issues, make the adjustments needed, and as a result cope better. The difficulty will be for him to accept that he needs more practice even if he already "understands" concepts, and to figure out how to change his habits. While we were familiar with "smart kid" issues and the danger of not being challenged, we had not anticipated that this would become THE instant when everything would back-fire.

As you can see, there are a wide variety of ways in which careless errors can arise; each of us has to be aware of our own personal issues, and make a strategy. I hope these three discussions will help others.

I’ll add something I’ve been telling students recently, which summarizes many of the points made above: In order to solve a problem well, you have to

- Think.
- Write what you thought.
- Think about what you wrote.
- Then fix it!

Here is a recent question from Fida, another long-time “patient” of ours at Ask Dr. Math:

]]>I hope you won’t mind answering a question.

I have seen people treating dy/dx as fraction although it is not. I am guessing it is a handy shortcut. But what is the proof for that?

e.g. in U-substitution

Here is a fragment where they do this trick

u=sin

xdu/dx = cos

xdu = cos

xdx