First, we have a 2004 request for a proof:

Proof of the Chain Rule I am using the chain rule (dy/dx = dy/du * du/dx) in my math class, and I would like to see it proved, which we don't do in class. My teacher told me the formal proof is anepsilon-delta proof, and in my spare time I have studied that kind of proof a little (using your splendid archives) so I can understand this proof.

We covered epsilon-delta proofs in Epsilons, Deltas, and Limits — Oh, My!

Doctor Fenton answered:

Hi Philip, Thanks for writing to Dr. Math. While epsilons and deltas are necessary to prove the technical details, you may already have accepted the necessarylimit theoremsfor a proof: thelimit of a productis the product of the limits; and if a function f is continuous at y = g(a), while the function g is continuous at a, thenf(g(x)) is continuousat x = a (assuming that the composite function f(g(x)) is defined in some open interval containing x = a).

These limit theorems, and others, are proved from the definition. We’ll be using them, and the definition of the derivative in terms of limits. The key idea will be that both theorems apply when it is known that the functions involved are **continuous in an interval around the point of interest**.

The Chain Rule applies tocomposite functions f(g(x)), when g is differentiable (and therefore continuous) at x = a, and f is differentiable at y = g(a). To investigate differentiability, the first thing to do is to examine the difference quotient: f(g(x)) - f(g(a)) ----------------- x - a , knowing that f(y) - f(g(a)) lim -------------- = f'(g(a)) y->g(a) y - g(a) and g(x) - g(a) lim -------------- = g'(a) . x->a x - a

On the surface, it looks like the chain rule is just a matter of **simplifying the product of fractions**:

The "natural" approach is to write f(g(x)) - f(g(a)) f(g(x)) - f(g(a)) g(x) - g(a) ----------------- = ----------------- * ----------- x - a g(x) - g(a) x - a and take the limit of both sides. However, there are functions (such asg(x) = x^2 sin(1/x)for a = 0) for which g(x) - g(a) is 0 infinitely often as x->a. This behavior cannot happen if g'(a) is non-zero, so this proof would work in that case. However,if you want to cover the case g'(a) = 0 as well, you have to be more careful.

Here is a graph of this special function \(g(x)=x^2\sin\left(\frac{1}{x}\right)\):

If we keep zooming in, it will keep looking like this, with *y* being zero over and over as *x* approaches 0. Consequently, that product of fractions can’t be simplified as needed on any interval around zero, so we can’t apply those limit theorems!

So we need a workaround, which involves defining a new function:

Let f(y) - f(g(a))F(y)= -------------- for y different from g(a) , y - g(a) anddefine F(g(a)) = f'(g(a)). Then F(y) is continuous at y = g(a), so F(g(x)) is continuous at x = a. Then we can write f(g(x)) - f(g(a)) g(x) - g(a) ----------------- = F(g(x)) * ----------- x - a x - a and this equation is true even if g(x) = g(a), since both sides are 0 (and the right side is not undefined, as before). Now, take the limit as x->a, and you obtain the Chain Rule.

This function \(F(y)\) has been defined separately for \(y\ne g(a)\) and for \(y=g(a)\), in a way that makes it continuous, so the limit theorems can be applied.

We’ll be digging further into that as we go!

In 2001, a student, Mahesh, had seen an equivalent proof, and didn’t fully understand it:

Assigning a Piecewise Function in a Proof Hi, As I was trying to teach my niece the chain rule, I came across a step which I have trouble understanding. I'll narrate the proof until the step with which I have the problem. Proof: lim f o g(x) - f o g(c) lim f(g(x)) - f(g(c)) (f o g)'(c) = x->c ------------------- = x->c ------------------ x - c x - c We must show that the limit on the right hand side has the value f'(g(c)).g'(x). Note that if we knew that g(x) - g(c) is not equal to zero for all x near c, x not equal to c, we could write: f(g(x)) - f(g(c)) f(g(x)) - f(g(c)) g(x) - g(c) ------------------ = ----------------- . ----------- x - c g(x) - g(c) x - c and show that the first factor on the right has limit f'(g(c)) and the other has limit g'(c).The problem is that g(x) - g(c) could be zero even if x is not equal to c.(Now the step I do not understand.) ===================================We introduce a function: / | f(u) - f(g(c)) | -------------- - f'(g(c)), when u <> g(c)e(u)= { u - g(c) | | 0, when u = g(c) \How can we say that e(u) is 0 when u = g(c)?The only way I thought that could happen is if the numerator of the first term is (u - g(c)).f'(g(c)) = f(u) - f(g(c)). Even so how can I be sure that the f(u) - f(g(c)) can have (u - g(c)) and f'(g(c)) as its factors? Can we guarantee that this manipulation will always be true? Even though we have introduced e(u),how can we have a rule that conveniently says e(u) = 0 when u = g(c)?I have seen other proofs, but I would like to understand this step. When I was in school I probably just accepted this step and went on.

Comparing to the first proof, we see that this \(e(u)\) amounts to \(F(y)-f'(g(a))\) there.

Mahesh appears to be thinking that \(e(u)=0\) is a *conclusion*, or an *assumption*, rather than a *choice*.

Doctor Schwa answered:

>How can we say that e(u) is 0 when u = g(c)? This is simply theDEFINITIONwe have given e(u). Depending on the value of u, it has different definitions. It's like saying: f(x) = {x^2 when x is positive, x when x is negative}. There's no *reason* f has to be defined that way; that's just the definition.

In math, a definition is where something first comes into existence; at that point, it can be defined in any way we like. No *justification* is needed. On the other hand, there is often a *reason* for defining it that way …

>Can we guarantee that this manipulation will always be true? So you seethere's no manipulation to be done. It's just the definition of a new function. We choose to define it any way we want to.We define it this particular way to patch up the hole at g(x) = g(c).

This is the reason: Making this definition gives us a continuous function, which we can work with; it plugs a removable discontinuity.

Thanks for your excellent question and your careful attention to detail. That's an important part of mathematics! You're right that on seeing it for the first time most people probably just accept this step on faith, memorize it, or ignore it... butfor future mathematicians, this kind of thinking is important.

Too many students do just skip over the hard parts of a proof.

The recent question about this, which called my attention to the two questions above (which I’d already found in preparing the other post), came from Kalyan in mid-January, as part of a longer thread:

Doctor,

Now I am proving the

derivative of a composite function. I used the derivation like this.But, the proof presented in the book is this:

In the first part of the derivation |Δu| is taken. Δu can also be negative; there is nothing wrong about it.

Why are they concerned about its magnitude not its sign?

Why is the author is putting so much effortto prove when the proof is a simple way I proved? Of course, I have missed out many details, but using things like |Δu| , η wastes a lot of time. Doesn’t it?

(I located the book, Leithold’s *Calculus with Analytic Geometry*, 1976, p. 138, in the Internet Archive, which has a better image than Kalyan’s, and am using that here. He initially included only the part I’ve shown above; we’ll see the rest later.)

Leithold uses a somewhat less familiar notation, in which \(D_xy\) represents the derivative of *y* with respect to *x*, so that his $$D_xy=D_uy\cdot D_xu$$ means $$\frac{dy}{dx}=\frac{dy}{du}\cdot\frac{du}{dx}$$

The variable \(\eta\) represents the difference between the actual difference quotient and its limit, the derivative, so its own limit is zero. This is then viewed as the function \(F(\Delta u)\), which is equivalent to \(e(u)\) in the previous version or the proof. As there, it is extended by “plugging the hole” at \(\Delta u=0\) with a piecewise definition.

Kalyan’s attempted proof is essentially the “natural” (that is, simple, or naive) proof from above.

I answered the second, larger question, though the discussion had been with Doctor Rick, because the topic was fresh in my mind:

Hi, Kalyan.

I’m not going to answer your specific question, which Doctor Rick can deal with. Rather I want to give you some additional information which touches on your question about why details matter in proofs.

I recently made a post about the chain rule, but chose not to include a couple answers about its proof, which you are looking at. I did mention that

the rule can be thought of as “obvious”, in the sense that you can easily convince yourself it makes sense; for many purposes, that is enough. But not to a mathematician! I explained that in Why Do We Need Proofs?:Mathematicians have been fooled in the past, and long ago vowed, “Never again!”In fact, a problem you just finished about the derivative of sgn(x) is an example where a “fact” that looks reasonable on the surface turns out to be wrong, precisely because it fails in special cases.

Here are the two answers about the proof that I chose not to include; note that the first

points out that a detail is necessary, and the secondexplains more about that same detail.

Here I copied the two questions we looked at above, and added

I think these are slight variations on the proof you are asking about.

I hadn’t yet seen the whole proof, so I couldn’t say more.

Doctor Rick answered the question I’d skipped:

What the book says is:

Hence, when |Δu| is small (close to zero but not equal to zero), the difference Δy/Δu – Duy …

All this is saying is that Δu itself is “close to zero”, on either side (positive or negative), but not exactly zero. A point Δu is “close to zero” if its magnitude |Δu| is small. As you note, Δu can be positive or negative and be “close to zero”, so

it’s the magnitude, not the sign, that matters.

This is equivalent to the \(|x-a|<\delta\) in the definition of a limit.

Why are they concerned about the magnitude of Δu? Well, they are primarily concerned to define a

continuous functionF(Δu) on aneighborhoodof u, that is, on an open interval of Δu that includes Δu = 0.

This function \(F(\Delta u)\) represents the difference between the difference quotient and the derivative, which should be small when \(\Delta x\) is near zero.

You haven’t shown the part of the proof where this is

used– and as you know, the way to answer a question, “Whyis this done in a proof (or other problem)?”, is to look athow it is usedlater on! Therefore I cannot be sure of the full answer in your specific case. However, it certainly appears that what is done here is essentially the same as in each of the proofs Doctor Peterson just showed you. They both introduce a function that is continuous on a neighborhood of Δu = 0, so that essentially what you did (in the simple version) will still work if u = g(x) is not sufficiently well-behaved.If you want to discuss this further, please show the rest of the proof in your book.

This is a key in trying to read a proof: Often the reason for a step is revealed only later, when it is needed.

Kalyan now showed the rest of the proof:

Hello Doctor,

I am actually asking about the involvement of two things:

1) F(Δu)

2) The definition of F(Δu).

Of course,

these two elements did not come to the proof inventor accidentally.(In some cases it has, but that is a different topic.)What is the reason behind the inclusion of the definitionthe way it is?PS: I also have a link written by a professor of Whitman College. I shall refer to it once the above gets cleared.

In this conclusion of the proof (which we hadn’t really seen before), we make another error function \(\phi(\Delta x)\) similar to \(F(\Delta u)\), and use that to define \(G(\Delta x)\).

Doctor Rick now replied, first on the origin of the proof:

Did you read what Doctor Peterson showed you? As I said, those two proofs do essentially the same thing as your proof, so

it’s clear that your author did not need to come up with the idea himself. He just had to adapt the idea to his particular notation and set of theorems. I don’t know the history, but I can imagine that when calculus was first being put on a solid mathematical foundation (150 or more years after it was first invented!),the first proof of the chain rule may have been like yours; but then someone noticed a “hole” in the proof, which likely caused considerable concern. I imagine it may have been some time before someone came up with a “patch” for the proof, easing mathematicians’ minds!The solution is far from obvious; to be honest, I have a hard time fully comprehending the proof myself, in the several versions I’ve looked at.

Mathematics is built one step at a time, generation by generation, and good ideas are passed on to the next.

Next, on the reason:

The author of your text does not explain the

reason that this extra work is needed(unless this was given before the proof). The answers Doctor Peterson showed you do give at least a brief explanation. The document whose link you provided this time also goes into detail on what is wrong with the simpler proof that he presents first, as well as how to fix it.

That document, by Leo Goldmakher, is well worth reading. It shows a proof from Spivak, which uses \(\Phi(h)\) equivalent to our \(F(y)\) and \(e(u)\), and is more complete than our answers above, but more readable than Leithold’s.

Let’s look at the reasons that the simple proof you wrote isn’t sufficient. That proof was:

d/dx f(u(x)) = lim

_{Δx}_{→}_{0}(f(u+Δu) – f(u))/Δx= lim

_{Δu}_{→}_{0}(f(u+Δu) – f(u))/Δu · lim_{Δx}_{→}_{0}Δu/Δx= f'(u) du/dx

This can go wrong

if Δu is zerofor some Δx other than zero. Then the second line above has a non-zero quantity divided by zero, so it is infinite. This won’t really be a problem if there is some Δx below which this never happens; in that case we still have a valid limit, becauseit’s only “sufficiently small” Δx that matters in the definition of a limit. But, as Doctor Fenton pointed out, there are rather perverse functions such asf(x) = x

^{2}sin(1/x)that have a sequence of zeros that get “infinitely close” to zero. Thus there is

no open intervalaround Δx = 0 within which Δu ≠ 0. This is the kind of case where we need to do something else.

Again, limits don’t care what happens **at** the target location (here, \(\Delta x=0\)), or at any distance **away** from there, but only in some **sufficiently small open interval** around the target; this is implied by the \(\delta\) in the definition of a limit.

(One could just

add a conditionto the Chain Rule Theorem to rule out such functions; but once a way was found to prove that the theorem still holds even for such functions,it would be inelegant to have an ugly conditionin the theorem when it isn’t absolutely necessary.)

Why say “for differentiable functions *f* and *g*, where \(g(x)\ne0\) near *a* …” when we don’t need to? Some textbooks will prove the easy case and just tell students that the general case can be proved; but for students who are capable of understanding the proof with a little stretching of their minds, this would be a disservice.

A second gap in the simple proof is the change from lim

_{Δx}_{→}_{0}(f(u+Δu) to lim_{Δu}_{→}_{0}(f(u+Δu). We need to be sure this is justified.Now, what you are asking, I think, is a bit more specific: What is the

reason for defining F(Δx)as it is? Having now seen the whole proof, I can point to the end of the proof where it says:Because lim

_{Δx}_{→}_{0}G(Δx) = 0 andF is continuous at 0(remember that we made it so), we canapply Theorem 2.6.5to the right side of the above equation, and we havelim

_{Δx}_{→}_{0}F(Δu) = F(lim_{Δx}_{→}_{0}G(Δx)) = F(0) = 0What is this Theorem 2.6.5? What we’re doing here is something that I did with an earlier question. You asked if certain reasoning was valid, which amounted to moving a limit inside a function, just as we see here. I responded by stating a theorem that you were

implicitly assumingto be true. I then determined under what conditions that hoped-for theorem is valid:lim

_{x}_{→}_{a}f(g(x)) = f(lim_{x}_{→}_{a}g(x))iff is continuous at x = a.

It is easy to assume something when you are approaching math informally, and the assumption is *usually* true; but we can’t do that in a *proof*. That is what the simple proof does.

Here is Theorem 2.6.5 from the book:

This tells us we can move a limit inside a **continuous** function. And since *F* was defined so as to be continuous, this makes the proof work.

First, this is from Sheri in 1996:

Probability that a Random Integer... I started trying to figure out what theprobability that an integer chosen randomly would be an integer multiple of a given integer "n"(or perhaps of a prime "p")...and then the probability that it would be a multiple of, say, the integers 2...7, and so on. I think I got stuck on trying to figure out how to formulate the "or" probabilities and add things up properly. I think my idea was to generate a sequence P(n) where P(n) would be the probability a random integer would be a multiple of at least one of the integers from 2 up to n. Can you head me in the right direction?

Doctor Tom answered:

Well, to be precise (and that's what we mathematicians have to be!), I have to know exactly what you mean by "an integer chosenrandomly". If you're talking about the entire infinite set of integers, there is no way to do this without some sort of adistribution functionover the integers, andthere is no such function that gives an "equal probability" for all integers.

We’ll see more about why as we go along; the basic idea is that if every element of a finite population of *N* items is equally likely, then the probability of one item being chosen is 1/*N*. But if *N* is infinite, that would make each probability zero – and so is the probability for any finite set of them. We can’t work with that.

But there’s a workaround for this problem:

To make things precise, here's what I'll assume you mean: If M is avery large integer, and we randomly (with equal probability) choosean integer between 0 and M, what's the probability that it is divisible by n? For any fixed M, you can work this out, and then if we take thelimitof these probabilities as M goes to infinity, I think that will be what you want. So if your question just concerns a single number n, the answer is that the limiting probability is 1/n. For large M, roughly 1/n of the integers less than M are divisible by n, and the error is smaller and smaller as M gets larger and larger.

So, if *n* is, say, 6, then every sixth integer is divisible by 6, which will be true no matter how large *M* is; so the probability of a multiple of 6 is \(\frac{1}{6}\).

He went on to answer the rest of the question, which was essentially just to use the LCM of all the numbers given in place of our single *n*.

Next, from 2003:

Probability in the Infinite PlaneThree randomly drawn linesintersect so as to form a triangle on an infinite plane. What is the probability that a randomly selected point will fallinside that triangle? Should points fallingonone of the three lines be considered as a possibility? Considering just one of the lines, I believe that there arethree possibilities: above, on, or below the line. Thus each probability for the interior of the triangle would be 1/3, and the overall probability would be 1/27. My teacher disagrees.

That first problem had an infinite but discrete sample space. Here, it is infinite and continuous, compounding the difficulty.

Jim, though, has several issues, starting with what probability is.

Doctor Wallace answered:

Hello Jim, This is an interesting problem. It is one that intrigued me to go and do some research to further my own knowledge of such problems. I would like to share with you what I discovered. First, your problem seems to fit into the category of "geometric probability," that is, probability that is computed using the principles of geometry and models ofarea. Here is a sample: A triangle of base 10 and height 5 is drawn on the coordinate plane. It is surrounded by a rectangle of area 100. What is the probability that arandomly selected pointinside the rectangle lies within the triangle? We express this probability as theratio of the areaof the triangle to the area of the rectangle. This would be 25/100 or 1/4, which makes sense, since the triangle comprises 1/4 of the area of the rectangle.

Geometric probability is what we used last time, and will explore more below. There are infinitely many points in the rectangle, so we can’t just divide the *number* of points in the triangle by the number in the rectangle; but we can assume that probability is the ratio of *areas*.

Examining your problem about the lines on an infinite plane, I do not think that the approach of trying to calculate the probability of the random point lyingon, over, or above the triangle's sideswill yield anything meaningful to the larger problem. 1/27 would be the correct answer to (1/3) cubed, but that assumes that the probabilities that the point lies above, on, or under the line areequal. Are they?

This is a common issue for beginners to probability. Similarly, the fact that a coin could land heads, tails, or on edge does not mean that the probability is 1/3, because those are not equally likely. This is discussed in How Do You Know That Events Are Equally Likely?

As in the first problem, we can start with a **finite** region, which still contains infinitely many points, but at least has finite area:

Simplify the problem a little. Suppose you were todraw a line on the wallof your room, cutting the wall in half. Now throw a dart at random at the wall. Would you expect that the probability of the dart landing ON the line to be equal to it landing above or below? Surely not. There is more "space" for the dart to land above and below. There is a small probability that it will land on the line, yes, but it vanishes when considered against the larger spaces of the rest of the wall.This is why an "area model" is useful. If the line is drawn halfway across the wall, we would expect the probability to be about 1/2 for above or below, because the areas are about equal.

We can ignore points on the line itself, because its area is zero (in principle; and very near zero for an actual drawn line). But the probability of hitting above the line is 1/2 only if that area is 1/2 of the wall.

Now back to your triangle. Suppose weforget about the point ON the line. Does the point have an equal chance, 1/2 and 1/2, of landing above or below? Yes. But you now have three lines, and the probability of one point independently landing below all of them would be 1/2individually, yes. So the probability would be 1/2 cubed, or 1/8 of landing below all of them. But again,independently. When you have the lines form a triangle, this is no longer the same question! The lines areinteracting with each other, and the resulting area of the triangle can now vary considerably.

All three lines will not cut the wall in half; and although there will be 7 (not 8!) regions, they will not have equal areas, so the probability will be neither 1/8 nor 1/7:

Think back to the wall of your room again. Imagine your three lines crossing it, but sloped in such a way that the triangle formed is avery, very tinyone. Now imagine another wall, again with three lines, but sloped so that the triangle formed isvery large- it could even take up most of the area of the wall. Would you expect a randomly thrown dart to land in each of the two triangles with equal probability? Surely not. Again, the randomly selected point will have a greater chance of landing in the triangle with the larger area. So 1/8 can't be meaningful any more, since we would get 1/8 for the probability of either triangle, or, for that matter, for any triangle we drew. Again, this is because the1/8 is the answer to a completely different problem.

Let's return finally to your original problem. You asked about a triangle on aninfinite wall (plane). You also gave no specifics about the triangle. With three random lines, it is possible to form a triangle of any area we like. However, the triangle formed will definitely have afinite area. It may be very, very large, but it will definitely be a bounded area. The plane, however, is unbounded. So if we try the area method for probability now, we would get a ratio of finite to infinite. Imagine the wall of your room again. The wall is infinitely large. It is limitless. It goes on and on and on... And somewhere on it, is a finite triangle, formed by your lines.This triangle, no matter how large its finite area, pales in comparison to limitless infinity. The triangle is swallowed up into boundlessness like a tiny drop in a vast ocean. Now what is the probability that your randomly selected point, somewhere on that vast plane lands in the triangle? Yes... vanishingly small. Effectively zero. Will it really be zero? Theoretically no, but practically yes.

Actually … theoretically yes! A zero probability does not mean absolute impossibility in an infinite setting like this.

You can think of the second picture above, with the smaller triangle, as the same triangle on a larger wall. We can see that as the wall gets larger, the ratio of area taken up by the triangle approaches zero.

This result bothered me, since the whole question is a theoretical one.We can't really investigate a true plane, since there is no such thing as an actual infinity.We would have to bound the plane somewhere, and then you will have an actual area for the denominator of the ratio, and so you would be able to calculate the probability. But you would also have to know the area of the triangle formed.

Technically, as mentioned in the first answer, the question itself is meaningless, as there is no way to define a uniform distribution on the entire plane, in order to say we are choosing a point randomly. A proper problem about a random point lying in a random triangle would have to be in a bounded region; but then we’d have to think more about what it means for the lines to be random. We won’t be going there.

The story doesn't stop there, however. I went digging on the Internet and I found a paper published by two researchers [Falk and Samuel-Cahn] at the Hebrew University of Jerusalem. They work at the Center for Rationality there, which studies interactive decision theory. This is an exciting field. The paper I found is in Microsoft Word format, and can be found at this URL: http://www.ma.huji.ac.il/~ranb/DPs/dp235.doc In this paper, they discussa problem that was posed by Lewis Carroll, author of _Alice's Adventures in Wonderland_ and a mathematician and logician of some renown. His problem is not the same as yours, but the two share a very similar characteristic - the idea of theinfinite plane.

This is an alternative interpretation of the acute triangle question, which I mentioned in passing last time, when I said, “A third option might be to **randomly choose three vertices**, but again we would need to restrict the region somehow.”

Carroll posed this:Three Points are taken at random on an infinite Plane. Find the chance of their being the vertices of an obtuse-angled Triangle.The two researchers show Carroll's answer to the problem, and they investigate his underlying assumptions. These have a direct bearing on your problem about the point lying in the triangle formed by three lines on an infinite plane. They note thatthere are fundamental contradictions inherent in assuming things about the infinite plane. They say that Carroll's answer was the right answer to a different problem, and that seems to be exactly what happened to you.

The article I mentioned last time also dealt with this Lewis Carroll version of the acute triangle problem. Here is part of their explanation:

Incidentally, Charles Dodgson posed the question in this form in his 1893 book of mathematical puzzles, entitled “Pillow Problems Thought Out During Wakeful Hours”. His Problem 58 posits “

three points taken at random on an infinite plane”, and asks for the probability that they are the vertices of an acute triangle. (Actually he asked about obtuse triangles, which is obviously just the complement.) However, as plausible as this question might sound, the premise of choosing three points “randomly” in the plane is not valid – at least not without giving a probability distribution for the points.There is no uniform distribution over the plane, so the notion of choosing three points randomly on the plane is inherently ambiguous.

They continue by showing that his method leads to contradictions.

The Falk article mentioned here starts with a similar comment:

The disturbing element in the problem’s text is the

. How could this be? This is a practically and conceptually impossible procedure. Put explicitly, Carroll assumed (1) that therandom samplingof points onan infinite planesample-spacefor his “statistical experiment” isinfinite, and (2) that theprobability-density functionon that space isuniform. However, these are twocontradictory assumptionsthat are sooner or later bound to entail paradoxical results (see Falk & Konold, 1992).

They go on to show a different contradiction, and then use a method similar to our first problem above (restricting the sample space to a large square, and taking the limit as it increases) to obtain an answer, namely 0.7249, in comparison to Carroll’s 0.6394.

We’ll close with a question from 2007, which brings us back to the main ideas from last time:

Probability of a Sum Meeting a ConditionTwo real numbersx and y are randomly chosen on a number linebetween 0 and 12. Find the probability that theirsum is less than or equal to 5. I am not sure how to do this. My idea below is probably wrong, and I need to know how to do the problem. 0+1=1 0+2=2 0+3=3 ..................... 12+12=24 x + y <= 5 ------------- = probability # of trials

This would be a good start, if the problem were a finite one involving only 13 possible numbers. But this is a continuous problem, which makes it more interesting.

Doctor Greenie answered:

Hi, Darren - Your thoughts about the problem only usewhole numbers. The problem says you can pickANY two real numbersbetween 0 and 12--like 5.189238945 and the square root of 93. So you will need a different method to attack the problem. To find an approach to the problem, let'sSTART with your work with whole numbersand then modify that approach to include all numbers between 0 and 12. We can make a chart showing all the sums of pairs of whole numbers from 0 to 12; we get our familiar addition table, but I'm going to arrange it in an unusual manner. 0 1 2 3 4 5 6 7 8 9 10 11 12 ------------------------------------------- 12 | 12 13 14 15 16 17 18 19 20 21 22 23 24 | 11 | 11 12 13 14 15 16 17 18 19 20 21 22 23 | 10 | 10 11 12 13 14 15 16 17 18 19 20 21 22 | 9 | 9 10 11 12 13 14 15 16 17 18 19 20 21 | 8 | 8 9 10 11 12 13 14 15 16 17 18 19 20 | 7 | 7 8 9 10 11 12 13 14 15 16 17 18 19 | 6 | 6 7 8 9 10 11 12 13 14 15 16 17 18 | 5 | 5 6 7 8 9 10 11 12 13 14 15 16 17 | 4 | 4 5 6 7 8 9 10 11 12 13 14 15 16 | 3 | 3 4 5 6 7 8 9 10 11 12 13 14 15 | 2 | 2 3 4 5 6 7 8 9 10 11 12 13 14 | 1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 | 0 | 0 1 2 3 4 5 6 7 8 9 10 11 12 | -------------------------------------------

This amounts to what Darren described, listing all possible sums, so we can count those less than or equal to 5; but it does so in a way that will be easier to count. (You don’t actually have to write out the entire table; Doctor Greenie only showed some of the numbers, though I’ve filled them all in to make it neater. You just have to visualize the idea!)

Suppose for now that we were restricting our numbers towhole numbers, and that we wanted the sum of the two numbers to be5 or less. Then we could get the answer directly from this table. There are 13 whole numbers from 0 to 12, so the number of sums in the table is13*13=169. And we can count the number of sums that are 5 or less; it is21. So using only whole numbers, the probability that our sum is 5 or less is21/169.

I put the desired sums in red above to make this visible. If we didn’t actually write it all out, we could see in our minds that the count will be \(1+2+3+4+5+6=21\); and if the numbers were larger, we could use knowledge of triangular numbers or of arithmetic series to simplify the work. The probability comes to 0.124, nearly 1/8.

To modify this approach so that we considerALL real numbersinstead of just whole numbers, we can keep the same basic picture, but instead of having separate, discrete numbers horizontally and vertically, we have a continuous range of numbers. So we can think of our picture asa square 12 units wide and 12 units high: 0 1 2 3 4 5 6 7 8 9 10 11 12 12 ------------------------------------- 11 | | 10 | | 9 | | 8 | | 7 | | 6 | | 5 | | 4 | | 3 | | 2 | | 1 | | 0 ------------------------------------- The complete set of combinations of two numbers we could select from anywhere in this figure is represented by theAREAof the figure, which is 12*12 = 144. Our objective is to determinewhat fraction of that total arearepresents pairs of numbers whose sum is 5 or less.

Every pair \((x,y)\) of numbers in the interval \((0,12)\) is represented by a point in this square. (The rows were numbered from bottom to top to match the coordinates in the first quadrant.)

We can get an idea of how to do that by looking at the figure we had when we were using whole numbers. The pairs of numbers whose sum is exactly EQUAL to 5 lie along a diagonal line from (0,5) to (5,0). So in our figure for the case where we are allowing ANY real numbers, we can draw a boundary for the sums that are 5 or less along that diagonal line: 0 1 2 3 4 5 6 7 8 9 10 11 12 12 ------------------------------------- 11 | | 10 | | 9 | | 8 | | 7 | | 6 | | 5 \ | 4 | \ | 3 | \ | 2 | \ | 1 | \ | 0 ---------------\--------------------- The sets of combinations of two numbers we can choose whose sum is 5 or less is represented by theAREA of the triangle formed by that boundary line. That triangle is a right triangle with legs of length 5, so its area is (1/2)(5)(5) = 12.5.

The boundary line has equation \(x+y=5\), and has intercepts \((0,5)\) and \((5,0)\).

Here’s a better graph of these regions:

So the total area from which we can pick our two real numbers between 0 and 12 is 144, and the total area for which the sum of those two numbers is 5 or less is 12.5. Therefore, the probability that the two numbers we pick between 0 and 12 have a sum of 5 or less is 12.5/144 =25/288

This probability is 0.0868, about 1/12.

The method for solving the problem is the same regardless of what the maximum value of our sum is supposed to be. For the case where the sum is supposed to beat most 12, you can probably see that the "boundary line" will cut the rectangle exactly in half, so the probability will be 1/2.

Here is that region:

For the case where the sum is supposed to be (for example)at most 18, the picture is a bit different, and the calculations might be done a bit differently. Our picture would be 0 1 2 3 4 5 6 7 8 9 10 11 12 12 ------------------\------------------ 11 | \ | 10 | \ | 9 | \ | 8 | \ | 7 | \ | 6 | \ 5 | | 4 | | 3 | | 2 | | 1 | | 0 ------------------------------------- In this case, the area containing the allowable pairs of numbers is represented by thewhole rectangle, MINUS the small triangular area. The area of the whole rectangle is 144; the area of the small triangle is (1/2)(6)(6) = 18. So the area of the allowable region is 144-18 = 126. And the probability that our sum is at most 18 in this case is then 126/144 =7/8.

This area model of probability is what we used last time; and the issue there was that different models of how we choose a random triangle would give different areas, and different probabilities.

]]>

It was not actually a question, but a request for confirmation:

A couple of decades ago I came up with the attached, which seems to show that

three quarters of all possible triangles are obtuse.My question is to ask if this treatment has any mathematical

validity. If so are there anyconditionson this?Over to you!

Ian Quayle

Here is his paper:

TrianglesConstruct any triangle from a base line and two angles.

Now, for the open-ended lines to converge and so form a triangle, x + y < 180º.

Now plot the line x + y = 180º on x, y co-ordinates as the boundary of a map of all possible values of x and y.

The green line shows the boundary of

allpossible values of x and y that would make a triangle.For

isoscelestriangles, either:x = y (plotted as the red line through the origin) or,

x or y = the third angle (z), in which case, for example,

x = 180º – x – y

x = 90º – y/2

and likewise, y = 90º – x/2

This gives the further two loci of isosceles triangles, also shown in red.

The intersection of the three possible isosceles triangles is the

equilateraltriangle.The loci of all

right-angled trianglesare plotted as the three blue lines making up a right-angled triangle (isn’t that beautiful!!). This shows the three possible cases: either x, y or z is a right angle (in the latter case, x + y = 90º)All triangles having

only acute anglestherefore lie within the blue boundary. The equilateral triangle, the most acute of all, lies exactly in the middle of this.By simple geometry, the area containing only acute angled triangles is exactly a quarter of the total area representing all triangles. Hence

a quarter of all possible triangles have only acute angles, the remainder have an obtuse angle.© Ian Quayle 1999

The interior of the green triangle contains all pairs of angles \((x,y)\) that are positive and sum to less than 180°, thereby forming a **valid triangle**. (Points on the green triangle itself represent **degenerate triangles**.) Points on the blue triangle represent **right triangles** (with either *x*, or *y*, or their sum being 90°), and its interior represents **acute triangles**.

Since the area of the blue triangle is 1/4 of the green triangle (which is dissected into four congruent triangles), it seems clear that 1/4 of all triangles are acute. (And the picture is, indeed, beautiful!)

I answered:

Hi, Ian.

Yes, what you say is basically correct.

The only issue is the last line, where we need to add some

conditionsas you suggest.The statement, “a quarter of all possible triangles have only acute angles” is valid only under an unstated

assumptionabout thedistributionof triangles, namely that two of theanglesare chosen randomly with auniform distribution. If they are chosen in a different way, then you would get a different result.

I notice only now that in saying “a quarter of all possible triangles”, he didn’t express what he was finding as a **probability**; but that is what it is. Ordinarily, we’d use his phrase to mean we’d counted all triangles, and found that 1/4 of them are acute. We can’t actually do that, which is what will lead to confusion. What do we mean by such a phrase when there are infinitely many triangles? This is where the idea of a (continuous) probability distribution will come in.

In contrast to discrete probability, we can’t *count* individual outcomes, but instead commonly assume that all locations are equally likely in terms of *area*. But this involves an assumption that is not always easy to justify: How do you know whether every point is “equally likely”, in this sense, when the probability of any individual point is zero? (It can be hard enough to decide whether all outcomes in a *discrete* problem are equally likely.)

We’ll get back to that. First, an alternative version of the problem:

Before I got to the end, I was already thinking about at least one other way to think of a random triangle, which was to choose the

side lengthsrandomly, rather than the angles; and, since triangles with the same angles but different sizes can be thought of as the same shape (that is, similar), we could hold theperimeterconstant, say at 1. So in my version, we can choose x and y randomly, with the third side being z = 1 – x – y.

He thought of a triangle as determined by **two randomly chosen angles** (with a fixed side), which is different from **randomly choosing two sides** (with a fixed perimeter), so each will give a different answer. (We can’t just randomly choose all three sides, because they could be any real number, and a uniform distribution would be impossible.) A third option might be to **randomly choose three vertices**, but again we would need to restrict the region somehow.

This takes us into the realm of my post

Broken Sticks, Triangles, and Probability I

where we consider the probability that

three randomly chosen lengths(with a given sum) will produce a triangle at all. (Be sure to read that, if you haven’t already, because it discusses the issue of different assumptions.) The diagrams there are reminiscent of yours. I’ve made a similar diagram (on Desmos) for my version of this problem:Here, the red lines again correspond to isosceles triangles, and all choices of three sides adding to 1 are inside the green lines, but all actual triangles are in the green shaded region. And where are the right triangles? Between the blue curves in the middle. This gives a different answer.

Again, the green region contains all triangles (the area outside that representing sets of sides that don’t satisfy the triangle inequality); this time, the blue curve representing right triangles (and containing acute triangles) is not a triangle, so its area will be harder to calculate.

Here is a sample diagram from the referenced page about making triangles:

Here the red region contains all points P(*x*, *y*) where *x* and *y* are points on a unit segment such that the three segments can form sides of a triangle, as shown for point P. This shows that the probability of making a triangle is 0.25, the area of the red region.

The following link takes you to a copy of this demo that you can manipulate, moving point P to see the corresponding triangle:

Demo: Making a triangle by breaking a segment

After doing this, I searched for existing discussions of this problem, and found a paper that starts with your idea, and then covers mine (and a couple wrong ones)!

Probability of a Random Triangle Being Acute

This gives your answer of 0.25 (confirming your work, but not showing your nice representation of isosceles triangles), and gives mine as 0.31776616…

For a variation of Ian’s method, they (mathpages.com) say this:

Assumingthe probability density for the triangle configuration points isuniformly distributedin this space, we see that the acute triangles comprise 1/4 of all triangles.

For my method, they say,

However, there are

other plausible ways of “randomly” selecting a triangle, leading to different distributions and different probabilities of being acute. …Using the normalizing condition c = 1 – a – b, we can eliminate c from these relations, and we can express the boundaries of the region of acute triangles by

These three boundary curves are shown in the figure below, outlining the region of acute triangles (the darkly shaded area) within the region of all possible triangles.

The area below the main diagonal is 1/8, so the area of the lightly-shaded region bounded by the main diagonal and the curve (1−a)(1−b) = 1/2 is given by

After a few more manipulations, they get a probability of $$12\ln(2)-8=0.31776616$$

I hadn’t worked out the actual probability from my diagram, but trusted what the author said there, because they agreed with the parts I had done!

I also found another place that combines my article’s question about sticks with my approach to this one, getting the

probability that three random lengths form an acute triangleas 0.07944, which is compatible with the other:

This is a LibreText textbook (Siegrist); the section finds the probability of three parts of a stick forming an acute triangle as $$P(\text{acute})=3\ln(2)−2\approx0.07944.$$

I say this is compatible because, using our probabilities, $$P(\text{valid triangle})=0.25\\P(\text{acute | valid triangle})=0.31776616\\P(\text{acute})=P(\text{valid triangle})\cdot P(\text{acute | valid triangle})=0.25\cdot0.31776616=0.07944154,$$

giving their result.

I’ve been working on a similar diagram in GeoGebra; here it is:

Demo: Making a random triangle from sides

You can drag point P around and see the resulting triangle; here is a right triangle:

Note that in the image, P is on the blue curve, resulting in a right triangle ABC.

I also made a similar demo for your version:

Demo: Making a random triangle from angles

Again, the green region contains

all actual triangles, and the blue region isacute triangles, and you can drag P around to see the triangle it represents:So, yes, I like your work, and it leads into other interesting ideas.

Here again, P is on the edge of the blue triangle, resulting in a right triangle ABC.

Ian replied, along with the paper I’ve inserted above.:

Hi Dave

That’s very nice to get my little idea from over twenty years ago validated by an expert. By the way, I am a well-retired professional electronics engineer who graduated nearly sixty years ago and has forgotten 90% of the maths I learnt at the time!

Regarding the

distribution, I don’t understand why that is important. The area contains all possible triangles and there must be an infinity of those. There shouldn’t be a problem in saying thatin that infinity of triangles, a quarter are acute. This seems to me the same as saying that a half of all integers are even.Your GeoGebra model is brilliant and fascinating. I haven’t got round to reading

Broken Sticks, Triangles and Probability, but will shortly.Best wishes

Ian

It is not surprising that he would have trouble with what I said about the uniform probability distribution being an assumption. Most students learn only about discrete probability, which is easily explained in terms of counting, and doesn’t require thinking separately about the distribution. It is the infinite number of possibilities that makes all this tricky.

I responded:

Thanks.

Hopefully, you’ll see the point about the distribution when you read, and ponder, the linked post.

The idea is that

how you choose things affects probabilities. A simple example would be that if you select numbers from 1 to 6 byrolling a die, and I do so bythrowing dartsat a nearby board with concentric rings labeled 1 to 6 aiming carefully at the center, and a child does the same with a distant board (trying again if she misses the board entirely), each number will have a different distribution of values. The die will produce auniformdistribution, with each number equally likely; I will produce a distribution that isalmost all 1’s(assuming I’m an expert just a couple feet away); and the child may produce a distribution in which6, on the outer ring, is more likelythan anything else. The probability of each getting a particular score will be different.

All three illustrations involve randomly chosen numbers, but chosen in different ways, with different distributions. My hope was that this discrete example might help see what is happening in the continuous problem we are facing. Randomness involves some method of choosing.

In our case, you’re choosing a triangle by choosing

angleswith a uniform distribution, while I’m choosingsideswith a uniform distribution. You’re assuming that the probability of any region on your diagram is proportional to its area, which means triangles are distributeduniformly in terms of angles; I suppose that to be true in my diagram,in terms of sides. Both are reasonable things to do; neither is an error. The “error” is just in not saying what you are assuming.

I added another discrete example:

Another example is seen in More on Gender Probability: Twins; there I mention that the probability of twins is different if you ask how likely it is for a particular

childto be a twin, or how likely it is for a particularbirthto be twins; there are twice as many twins as there are twin births! So how we choose what we count makes a difference in the probability. But sources often fail to state how they count.

Another similar issues lies behind Frequently Questioned Answers: The Other Child, where different ways of meeting a child result in different (and often confusing) probabilities.

Ian still wasn’t convinced:

Many thanks for your explanation, Dave. I do understand about probability distribution, but not sure why that applies in this case.

I am not choosing anything, simply covering all possible triangles.The number of possible triangles is infinite, so the probability of randomly selecting any particular triangle must be zero.So, if we relax the problem by quantizing the possible angles into, say, seconds of arc (3600 arc seconds per degree). The number of possible triangles is (180 * 3600)^2 / 2). That’s a very big number but not infinity! You could now represent all possible triangles (quantized) then as the intersections of a fine grid aligned with the axes, within the area of all possible triangles. The intersections of the grid are 1 arc second apart and each intersection represents a unique triangle. It is clear from my original drawing that there will be only a quarter of these intersections within the “acute triangle” area.

If you quantize to even smaller angle increments, the same logic applies, so

in the limitthe answer remains the same. A quarter of all possible triangles are acute.If you now wanted to apply your random selection ideas,

you could draw a three-dimensional surfaceover my diagram, with the z axis representing probability, but is that relevant to the point I am trying to make?Kind regards

Ian

There are two main points here: **whether there is a choice** being made, and how to handle **infinite possibilities**.

I replied:

I’m taking my time trying to find a better way to explain what I’m saying, since my examples clearly didn’t work, and presumably reading my past post (and its companion) didn’t either. The beginning of the first discusses at some length how defining

randomnessof a particular entity requires a choice of aprocessfor “random” selection of that entity.You say,

I am

not choosing anything, simply covering all possible triangles. The number of possible triangles is infinite, so the probability of randomly selecting any particular triangle must be zero.I seems to me that the paper I linked to says it as clearly as I can:

Again,

ifthe probability is uniformly distributed throughout the combined region, it follows that the probability of a randomly selected triangle being acute is exactly 1/4.However,

there are other plausible ways of “randomly” selecting a triangle, leading todifferent distributionsand different probabilities of being acute. …Whenever you talk about “random” anything, you have to assume

some way of selecting it— even if you don’t think you are “choosing”. You need to consciously decide what distribution is appropriate (or just make it explicit), rather than unthinkingly accept some default; and a distributionisa way of “choosing”.The mere fact that

we can get different answerswhen we model the space of possible triangles differently should be enough to convince you that you have made a choice that affected the answer, and therefore your answer is not the only valid one.

We might compare this idea of different “models” to the existence of different map projections. In some maps, the apparent area of, say, Greenland appears larger than South America, which actually is 8 times as large. Here is the Mercator projection, compared to the equal-area Gall-Peters projection:

So if you asked which is a larger fraction of the earth (a larger probability of being hit if you chose a random spot on the globe), you would get a wrong answer if you used a map where area is distorted. And you’d get yet another answer if you chose a random *person* on the globe.

Similarly, Ian and I made different “maps” of the space of triangles, based on creating a triangle in different ways.

His approach to the infinite number of possible triangles comes very close to what we do formally in calculus, and may in fact clarify the issue.

As for the fact that there are infinitely many possible triangles, that’s just what happens with any

continuous distribution(as opposed to discrete distributions, where there are a finite number of possibilities, which is what students typically are most exposed to in pre-calculus courses on probability). Your thoughts there amount to a way to approach the infinite case without formally using calculus. Your area model of probability accomplishes the same thing: weassumethat the probability of an event is proportional to the area representing that event.But that assumption inherently supposes

a uniform distribution of the parameters you use in your model, and other parametrizations of the probability space will correspond to different distributions that yield different probabilities. That’s what my alternative does: it uses different parameters, taken to be uniformly distributed, resulting in a different distribution of the triangles.

Taking his idea of dividing the area into tiny regions, we see that he is assuming that each of those possible regions, corresponding to a tiny range of angles, is **equally likely**. And they will be, **if** you make a random triangle by choosing a random pair of angles, using a uniform distribution for each. If you imagine making a triangle starting with something different (such as my starting with a pair of lengths), then each tiny region in *my* diagram will be equally likely, and my version is correct.

Both answers are correct, if we specify the problem according to our parameters. As I said at the start, the only error is in not specifying that one’s answer applies only to the specified model.

]]>

Here is the question, from the last day of 2023:

With a

sequence of numbers between 1 and 49 inclusive– how many combinations would occur for there to be6 numbers with 2 and only 2 of the 6 being consecutive.Example: 5, 15, 16, 27, 35, 42

This needed clarification; it can be hard to state a probability question clearly, especially in the abstract, just as it can be hard to interpret a problem when you read it. I answered,

Hi, Bill.

I don’t fully understand the question.

Are you supposing we have been

givena sequence of numbers (of any length), and you want to know how manysubsequencesof 6 terms there are that contain exactly one pair of consecutive numbers? That would depend on the particular sequence given; are you looking for an algorithm to find this number?Or are you asking how many sequences (

increasingsequences of 6distinctnumbers, as in the example?) there are with exactly one pair of consecutive numbers? Or perhaps you really meansetsrather than sequences?Perhaps you can make this clearer by describing the context of your question. What sort of statistical problem are you working on? Or, if this is a problem you were given, can you state it exactly as asked, rather than a paraphrase?

The word “sequence” in a combinatoric question generally indicates that *order matters* and *repetition is allowed*, and in particular that the same numbers could be put in different orders and counted as different. It will turn out that in fact order *doesn’t* matter in this sense, and repetition *is* allowed; this is what I suggested in my alternative interpretation.

Bill replied:

It’s a

lotterysituation – 6 numbers from 49 = 13,983,816combinationsso I was wondering how many combinations would be if two of the numbers were consecutive in the sequence?

This provides both a clear context, and the right formal word: A **lottery** involves a **combination**: a set of distinct numbers, in which order is not distinguished (so that listing the numbers in *increasing* order fully describes them).

And, indeed, the number of combinations of 6 chosen from 49 is $$\require{AMSmath}{\binom{49}{6}}=\frac{49!}{6!(49-6)!}=\frac{49!}{6!43!}=\frac{49\cdot48\cdot47\cdot46\cdot45\cdot44}{6\cdot5\cdot4\cdot3\cdot2\cdot1}=13,983,816$$

Now I could answer:

That does clarify the question.

Now, in order to help me know what kind of answer to give, can you tell me whether you ask in order to

learnhow to calculate combinations, or for somepracticalreason? (or just fromcuriosity)Meanwhile, I’ll see how hard it is to find. (Some combinatorial problems are harder to solve than they’re worth, but this probably isn’t too bad. It’s worth more, though, if you learn something useful from it …)

When the purpose is to learn, I avoid giving too much of the solution; here that probably isn’t the purpose, but I wanted to give Bill time to think, if he chose to.

However, after a 24-hour wait (over New Year’s Eve) with no response and plenty of time to ponder, I changed my mind; it didn’t sound like a homework problem, and it would probably be too hard to lead anyone to discover the solution for himself, so why not do it?

I was able to solve the problem, and it’s interesting, so I’ll just answer the question fully. Here is my work:

First, as you know, the total number of ways to

choose 6 of 49 numbersis C(49, 6) =13,983,816.

The probability is the number of ways to “succeed” (in this case, to make a combination with exactly one pair of consecutive numbers), over the total number of ways to make *any* combination. So this number, which he already knew, will be the denominator. It’s the numerator that is tricky!

I often start by modeling what we want in some physical way, whether it’s balls in an urn, cards in a hand, letters in a word, or whatever. Here, where our original model was a list of 6 numbers, I chose to think first of putting all 49 possible numbers in a row and *selecting* 6 of them, and then to transform that image into 49 *places* in a row, and putting markers *into* 6 of those places.

A choice of 6 distinct numbers from 1 to 49 can be visualized as putting

marbles in 6 of 49 slots:_ _ _ _ _ _ _ _ o _ _ _ _ _ _ _ o _ o _ _ _ _ _ o _ _ _ _ _ _ _ _ _ o _ _ _ o _ _ _ _ _ _ _ _ _ _

= 9, 17, 19, 25, 35, 39

A choice with

exactly two consecutive, corresponds to a placement of marbles withexactly one pair adjacent:_ _ _ _ _ _ _ _ o _ _ _ _ _ _ _

o o_ _ _ _ _ _ o _ _ _ _ _ _ _ _ _ o _ _ _ o _ _ _ _ _ _ _ _ _ _= 9,

17, 18, 25, 35, 39

We can now imagine gluing that pair together to make a special object. That gives us one less object, and one less slot.

This is equivalent to a choice of

5 out of 48 slots,none of them adjacent, withone of the 5 distinguished:_ _ _ _ _ _ _ _ o _ _ _ _ _ _ _

*_ _ _ _ _ _ o _ _ _ _ _ _ _ _ _ o _ _ _ o _ _ _ _ _ _ _ _ _ _Thus, our answer will be

5 timesthe number of ways to choose 5 of 48 slots with none adjacent:_ _ _ _ _ _ _ _ o _ _ _ _ _ _ _ o _ _ _ _ _ _ o _ _ _ _ _ _ _ _ _ o _ _ _ o _ _ _ _ _ _ _ _ _ _

That is, we are now first selecting 5 non-adjacent slots out of the 48, and than choosing 1 of those 5 into which to put two marbles.

But we still need a way to make sure none of the slots we pick are adjacent.

Now, how can we count

those?I’ll be using a method explained in

Stars and Bars: Counting Ways to Distribute Items

We can make such an arrangement by first putting the 5 marbles in a row,

o o o o o

and then placing

43 empty slotsamong them, making sure to put at least one in each of the 4 intervals between the marbles, but allowing the spaces at either end to remain empty.

I think of this process as turning the problem inside-out: Rather than putting marbles in slots, we are putting slots among the marbles! But since we want to make sure there are empty slots between each pair of marbles, but we allow the possibility of no space at the ends, that model is not yet right.

We’ll turn the model inside-out again, this time treating the empty slots as objects in themselves, and not putting marbles into slots, but in spaces between them:

But we can count more easily if we

reverse the process: First place the43 empty slots, leaving 44 placesbetween and aroundthem in which to put 5 marbles, no more than one in each space!?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?

So there are C(44, 5) = 1,086,008 ways to do this, and the final answer will be 5 times that:

5 * 1,086,008=5,430,040.

Putting marbles into separate spaces between empty slots ensures that they are not consecutive.

What we’ve done here is sort of like solving a puzzle box, turning the box to different orientations and pushing a button here, then another there, until it falls open. And what’s in the box? The probability we’re looking for:

The

probabilityof exactly two consecutive numbers is 5,430,040/13,983,816 =0.3883. So more than 1/3 of the time, you will expect this to happen.

There’s the surprise! The question presumably arose from seeing this happen often in lottery numbers, and thinking that the observed frequency was greater than expected.

Putting it together, our calculation was $$\frac{5\cdot\binom{44}{5}}{\binom{49}{6}}=\frac{5,430,040}{13,983,816}=0.3883$$

One technique for getting started with a problem is to try a smaller version of the problem first; but we can also use a smaller problem as a way to check a general formula. This is, in fact, a major reason for the technique of solving a generalized problem in order to solve a particular problem. Here, we’ll generalize what we’ve already done, just so we can do that check.

Let’s check with a

smaller example, something I like to do when I am not entirely sure of my work in a problem like this.First, the

general formulafor a lottery ofk numbers out of n, following the same reasoning as for our k=6, n=49, is (k-1)*C(n-k+1, k-1)/C(n,k).

This is $$\frac{(k-1)\cdot\binom{n-k+1}{k-1}}{\binom{n}{k}}=\frac{5,430,040}{13,983,816}=0.3883$$

and corresponds to our calculation above.

If we choose 3 numbers from 1-6, taking

k=3 and n=6, this gives 2*C(4, 2)/C(6,3) = 2*6/20 =0.6.

This example is small enough to list all outcomes:

In fact, here are the 20 possible choices, with the single pairs in red:

123 (two pairs)

124

125

126

134135

136

145146

156234 (two pairs)

235

236

245246

256345 (two pairs)

346

356456 (two pairs)

That’s 12/20 =

0.6, just as the formula said. So I’m confident applying it to the real lottery.

As a further check, I looked up results for the Canadian “Lotto 6/49” for 2023 to the present, and of 114 draws, 46 had exactly one consecutive pair, 11 had two, and 2 had three:

- 25th February 2023: 2, 23, 24, 25, 39, 40
- 4th October 2023: 18 19 26 27 29 30

This means that 40.35% had exactly one, and 51.75% had at least one. This is reasonably close to our calculation of 38.8% for exactly one.

One more thought: When I searched to see if this was a well-known question, I found a paper about a broader problem, one conclusion of which is that the probability of

at least onepair of consecutive numbers is0.495198. As expected, this is higher than ours (almost 1/2, in comparison to over 1/3). In fact, we can use what we did above to solve this one easily:The probability that there is

at least onepair of consecutive numbers is 1 minus the probability that there arenopairs. So we need to count the ways to place6 marblesand43 empty spaces, with at least one of the latter between any two of the former:_ _ _ _ _ _ _ _ o _ _ _ _ _ _ _ o _ o _ _ _ _ _ o _ _ _ _ _ _ _ _ _ o _ _ _ o _ _ _ _ _ _ _ _ _ _

We can do this by choosing 6 of the 44 places

between or aroundthe 43 empty spaces:?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?_?

This is C(44, 6) = 7,059,052; the probability is 7,059,052/13,983,816 = 0.5048, and our answer is 1 – 0.504802 = 0.495198.

That is close to our observed probability of 51.75%. Conclusion:

Pairs of consecutive numbers are considerably more common than you might expect!

Bill replied,

Excellent answer – thank you very much for giving such a clear explanation.

Presumably, whatever the goal, it was satisfied.

]]>

First, let’s look at the Ask Dr. Math FAQ on the subject:

What is Russian peasant multiplication? How do I use it?The way most people learn to multiply large numbers looks something like this:

86 x 57 ------ 602 + 4300 ------ 4902If you know your multiplication facts, this “long multiplication” is quick and relatively simple. However, there are many other ways to multiply. One of these methods is often called

the Russian peasant algorithm. You don’t need multiplication facts to use the Russian peasant algorithm; you only need to double numbers, cut them in half, and add them up. Here are the rules:

- Write each number at the head of a column.
Doublethe number in the first column, andhalvethe number in the second column.

- If the number in the second column is
odd, divide it by two anddrop the remainder.- If the number in the second column is
even,cross out that entire row.- Keep doubling, halving, and crossing out
until the number in the second column is 1.Add up the remaining numbersin the first column. The total is the product of your original numbers.

You may recognize similarities between this and Egyptian multiplication; there, instead of halving the second factor, we doubled starting with 1, and subtracted to choose rows that add up to the second factor. Both methods make it unnecessary to learn multiplication tables (or even, as we saw last time, to have a place-value number system).

Let’s multiply 57 by 86 as an example:

Write each number at the head of a column.

57 86Double the number in the first column, and halve the number in the second column.

57 86 114 43If the number in the second column is even, cross out that entire row.

~~57 86~~114 43Keep doubling, halving, and crossing out until the number in the second column is 1.

~~57 86~~114 43 228 21~~456 10~~912 5~~1824 2~~3648 1Add up the remaining numbers in the first column.

~~57 86~~114 43 228 21~~456 10~~912 5~~1824 2~~+ 3648 1 4902

And there’s the answer, like magic! Of course, it’s not really magic; we’ll look into *why* it works next.

Real Russian peasants may have tracked their doublings with bowls of pebbles, instead of columns of numbers. (They probably weren’t interested in problems as large as our example, though; four thousand pebbles would be hard to work with!) Russian peasants weren’t the only ones to use this method of multiplication. The

ancient Egyptiansinvented a similar process thousands of years earlier, andcomputers are still using related methods today.

Computers use essentially this method, but in binary, where doubling and halving both amount to mere digit-shifting. We’ll see what that looks like soon.

Good students want to know why it works, not just how to do it. We’ll look at two such questions, so that if one doesn’t quite work for you, the other might. Here is the first, from 1998:

Russian Peasant Method of Multiplication I understand the 'Russian peasant' method of multiplication, butI do not understand why it works. ex:~~39 x 52~~~~78 x 26~~156 x 13 (double and halve)~~312 x 6~~624 x 3 1248 x 1 156 + 624 +1248 = 2028 add only left with odd right Can you explain? Thanks

I answered (this was probably my first exposure to the method):

Hi, Kara, Russian Peasant Multiplication is actually a way ofsimultaneously converting a number to binary and multiplying it by another number. To show the relation clearly,I'll work with a small example, 10 x 6. I'm going to assume that you have at least some knowledge of binary numbers; if not, write back and I can rephrase this more simply.

What follows is one of several methods of base conversion; we’ll have a series on bases sometime soon.

To convert the number 10 to binary, wedivide it by two repeatedlyand note which divisions give a remainder (of 1): 10 / 2 = 5 r 0 ------------------------------> 0 5 / 2 = 2 r 1 ----------------------> 1 2 / 2 = 1 r 0 --------------> 0 1 / 2 = 0 r 1 ------> 1 The answer is 10 = 1010 in binary (reading the remainders upwards). This means that10 =1*2^3 + 0*2^2 + 1*2^1 + 0*2^0 = 2^3 + 2^1 =8 + 2

The last line says what it means to say that \(10_{ten}=1010_{two}\): The place values are 8, 4, 2, 1 respectively, so 1010 base 2 means $$1\cdot8+0\cdot4+1\cdot2+0\cdot1=8+2=10.$$

To see why this works, just write everything (except the 2) in binary: 1010 / 2 = 101 r 0 --------------------------> 0 101 / 2 = 10 r 1 -----------------> 1 10 / 2 = 1 r 0 ---------> 0 1 / 2 = 0 r 1 -> 1 All we're really doing ispeeling off the rightmost digit at each step.

Each division by 2 produces a remainder that is the next binary digit, starting at the right. A computer thinks of this as shifting the number one place to the right, with the last digit shifting into the remainder.

Now we’ve found that we can write 10 as the sum 8 + 2, a sum of powers of two.

Now to multiply 10 by any other number, we just have to use the distributive rule tomultiply that number by each power of 2 that is present in 10:10* x = (8+2) * x = (2^3 + 2^1) * x = 2^3 * x + 2^1 * x We can find these multiples of powers of two by starting with the given number and doubling it repeatedly: 2^0 * 6: 6 2^1 * 6: 6 * 2 = 12 2^2 * 6: 12 * 2 = 24 2^3 * 6: 24 * 2 = 48 The answer to10 * 6, then, is (2^3 * 6) + (2^1 * 6)= 48 + 12 = 60.

Here we’ve doubled the second factor, 6, repeatedly, and added together those which we need based on the binary of the first factor, 10.

Now we can put everything together in one little chart: Divide by 2 Remainder Power of 2 Double Sum ----------- --------- ---------- ------ ------- 10 5 0 1 6 212 12121 0 4 24 018 4848-- 60 The two leftmost columns find the binary digits for 10, the next two find 6 multiplied by powers of two, and the last column sums the powers of two that form 10, multiplied by 6, to get the result.

So we add the doubles of 6 corresponding to the 1’s in the binary representation for 10. In the table above, the doubling starts on the second line, and we shift that column up in the actual method (next) so that the doubles we use are written on the rows with an odd number in the first column (which produces a 1 in the second column).

Russian multiplication just compresses the first, fourth, and fifth columns into a simple format: 10 6 5 12* 2 24 1 48* --- 60 You don't need to write down the remainder, because it's 1 if the number you just divided by 2 is odd. You don't need to write down the powers of two, just the doublings. Bymarking the doublings that correspond to odd halvings, you select the terms to add to get the result. You may also notice that the lines that were added (marked by asterisks) show the binary value 1010 when you read up.

This version of the algorithm is written backward from the usual, due to the order in which we obtained it, and I’ve marked rows to use (*) rather than crossing out the rows to ignore. Using the form we’ve seen, the work looks like this:

~~6 10~~12 5~~24 2~~+ 48 1 60

Now, how do computers do the same thing?

This also corresponds to how binary numbers are multiplied, because all we do ismultiply 6 by either a one or a zeroin each place (which is really just selecting whether to include it in the sum), andshift it one place to the lefteach time (which is really a doubling): 0110 6 x 1010 x 10 ------ ----- 0000 (6 * 1) 0110 6 * 2 0000 (6 * 4) 0110 6 * 8 ------- ----- 0111100 6 * 10 I hope that makes it all clear. It's amazing how ancient people (including the Egyptians) multiplied the same way computers do today!

I mentioned the Egyptians because this method is just a refinement of Egyptian multiplication. Here’s how that method works for this same problem, \(6\times10\):

~~6 1~~12 2~~24 4~~+ 48 8 60

Here, rather than divide 10 by 2 repeatedly in the second column, we multiplied 1 by 2 repeatedly. Then we find a set of those multiples that add to 10, by subtracting from 10: $$10-8=2; 2-2=0\), so \(10=8+2$$ Therefore, we add the corresponding multiples of 6, and get our answer. The Russian peasant amounts to the same thing, but without the need to search for addends. Possibly the reason the Egyptians didn’t do this is that halving was a little less convenient for them with their notation. (I don’t know how actual Russian peasants did this!)

Here is a similar question from 2003:

How Does This Multiplication Method Work? I've just learned a new way to multiply, where all you have to do is double, split in half, and add. Youdoubledown one column and takehalvesdown the other, dropping remainders, until the halving column reaches 1. Then youcross outthe rows where the halving produced an even number andaddthe remaining numbers in the doubling column. For example, to multiply 27 * 38, it would work like this: 27 38 cross out the rows with 38,4,2 since they're even and 54 19 add the numbers left in the doubling column. 108 9 54 + 108 + 864 = 1026 and 27 * 38 = 1026 216 4 432 2 864 1Why does this work?How could you extend this to division, where all you have to do is double, halve, and add?

Doctor Tom, also new to the method, answered, also seeing the binary connection, but starting with the multiplication rather than the conversion:

Hi Cathy, That's nice! I had not seen this method before. What it amounts to is astandard multiplication, but using base 2. Let me illustrate by showing how a base-2 multiplication of those same two numbers would work: 27 = 11011 (in base 2, 16 + 8 + 2 + 1) 38 = 100110 (32 + 4 + 2)

These could be obtained by the conversion method I demonstrated above.

Now if I do base-2 multiplication just like I'd do it in base-10, here's what it would look like when you multiply just before you add: 11011 100110 ------ 00000 11011 11011 00000 00000 11011 -------------

Note how simple the multiplications are: 1 times the number is itself, and 0 times the number is 0, so all we’re doing is writing shifted copies of the number. Normally, we would now add all the partial products, giving \(100\ 0000\ 0010_{two}=1026\_{ten}\). But we won’t be adding in binary.

Now if you look at the terms that you need to add up to make the final total, each successive row isshifted by 1 place, which is equivalent to amultiplication by 2. Thus the non-zero rows represent, from top to bottom: 27 x 2 27 x 4 27 x 32 And this makes sense--if you add them together, you obtain 27(2 + 4 + 32) = 27 x 38. Notice that in the left row of the sums in your example, we've got 27, 27x2, 27x4, 27x8, and so on, but we've somehow managed to toss out all the ones that shouldn't be there, leaving only 27x2, 27x4 and 27x32.

There’s just one detail left to understand.

OK, sowhen do we get non-zero rows?The answer is: whenever there's a "1" in the binary expansion of 38. Let's look at 38 and seehow to figure out its binary expansion. The first thing you want to find is if the final digit is 1. That will occur if its final digit is odd. To find the next most significant binary bit,divide by 2, tossing the remainderif necessary, and look at the final digit of that. If it's odd, there will be a 1 in the binary expansion, and so on.

This amounts to our conversion to binary.

When you did successive divisions by 2 to 38 in your example, you got: 38, 19, 9, 4, 2, 1 Look at the odd-even pattern here, where I write 0 for even and 1 for odd: 0, 1, 1, 0, 0, 1 This is just the binary expansion of 38, but in reverse order.

Again, here is the multiplication:

~~27 38~~(0) 54 19 (1) 108 9 (1)~~216 4~~(0)~~432 2~~(0) + 864 1 (1) 1026

The numbers we add, 54, 108, and 864, correspond to the non-zero rows in the binary multiplication we did.

I hope that's enough to make it clear to you why the system works. If not, go though a couple of other examples. Notice that there's nothing special about the 27 side; whatever number you put in that column just keeps doubling up and you just need to select the right ones to make the binary multiplication work. What you need to do is convince yourself that the other side where you successively divide indicates the proper positions for 1 bits in the number that you place in the position occupied by 38 in your example.

He didn’t answer the part about how to extend this to division. The Egyptians had a related way to divide, which we saw last time, but I’ve never heard of a “Russian peasant division”, and I can’t think of a way to use both doubling and halving to do it. So the best answer I could have given would be to demonstrate **Egyptian division**.

Let’s use Egyptian division to undo the multiplication we just did, namely to divide 1026 by 27:

1 27 2 54 4 108 8 216 16 432 32 864

We’ve doubled starting at 1 and at the divisor, 27, until the next number in the right column would be greater than the dividend, 1026. Now we take the dividend, and repeatedly subtract the largest number less than what we have left:$$1026-{\color{Blue}{864}}=162\\162-{\color{Green}{108}}=54\\54-{\color{Red}{54}}=0$$ We mark the rows we used, and add the numbers in the left column:

1 27 2 54* 2 4 108* 4 8 216 16 432 32 864* 32 38

That’s our quotient.

In this Egyptian method, we had to search for the rows to use; Russian multiplication avoids that step, but I see no way to avoid that (or something similar) for division.

Here is how computers might do the same division, if they simply used long division in binary (they actually use much faster methods that are less useful for humans):

$$1026_{ten}=100\ 0000\ 0010_{two}\\27_{ten}=1\ 1011_{two}$$

and so on

$$\begin{aligned}100\ 0000\ 0010&\\\underline{-11\ 0110\ 0000}&\leftarrow11011\times10\ 0000\\1010\ 0010&\\\underline{-110\ 1100}&\leftarrow11011\times100\\11\ 0110&\\\underline{-11\ 0110}&\leftarrow1\ 1011\times10\\0&\end{aligned}$$

What we did here was to shift the divisor left as far as possible while staying less than the dividend; then subtract and repeat. The first subtraction was shifted left 5 places, so the quotient is 100000; the second subtraction was shifted left 2 places, so the quotient is 100, and the third subtraction was shifted left 1 place, so the quotient is 10. Adding the partial quotients, the quotient is \(10\ 0000+100+10=10\ 0110\); which is 38.

And this amounts to the same subtractions we did in the Egyptian division.

]]>

We’ll start with a question from 1999:

Adding and Subtracting Roman Numerals I would like to knowhow to add and subtract Roman Numeralswithout converting them to regular numbers. I tried looking for patterns in different problems like XXX + X = XXXX, which I know is written XL. A problem I can't find a pattern for is LX - XIV = XLVI.What did the Romans do to solve this kind of problem?

Doctor Rick answered:

Hi, Greg. You might want to look at the Roman Numerals page in our Dr. Math FAQ: http://mathforum.org/dr.math/faq/faq.roman.html I have read that Europeans didn't switch to Hindu-Arabic numerals for a long time becausethey didn't see a reason to do arithmetic on paper, using written numerals. They would set up the numbers on anabacus, do the math, then write down the answer. (It's a little like using a calculator and not bothering to learn to do arithmetic by hand.)Roman numerals are closely related to the abacus-- that was one reason they liked them.

We discussed this relationship of Roman numerals to the abacus last time. The FAQ (before showing how to use the abacus!) gives two examples of how one might do addition, which I’ll reformat here for readability:

Let’s start with an addition problem: 23 + 58.

In Roman numerals, that’s XXIII + LVIII.

We’ll begin by writing the two numbers

next to each other: XXIII LVIII.Next, we

rearrange the lettersso that the numerals are indescending order: LXXVIIIIII.Now we have six Is, so we’ll rewrite them as VI: LXXV

VI.The two Vs are the same as an X, so we simplify again and get LXX

XI, or 81, as our final answer.

Note that this had no subtractive parts, so the sum is just the sum of all the symbols, which we could rearrange at will. All the work was mere simplification.

Now let’s try another addition problem:

14 + 17, or XIV + XVII.

Notice that the I in XIV is being subtracted, so this problem is going to be a little more complicated.

We begin the way we did before, by writing the numbers

side by side: XIV XVII.The

subtracted Iin XIVcancelsout another I, so we cross them both out: XV XVI~~I~~.~~I~~Next we put the remaining letters into the right order: XX

VVI. Simplifying gives us XXXI, or 31.

Would you enjoy doing this? Doctor Rick showed a subtraction, offering some ways to make both addition and subtraction easier:

If you want to add and subtract Roman numerals, I suggest you do a little bit of conversion first:get rid of the subtraction rule. That is, rewrite XIV as XIIII. Then you cansubtractlike this: LX = XXXXX V IIIII - XIIII = - X IIII -------- ------------- XXXX V I Now you can resume using the subtraction rule, so that the answer is XLVI. I "borrowed" or "regrouped," sort of the way we do it in regular subtraction: Ichanged L into XXXXXso there would be enough X's, and Ichanged X into VVandone of the V's into IIIIIso there would be enough I's. You can make shortcuts by remembering that V - IIII = I, for instance.

The Romans probably never wrote work like this, but did equivalent things on the abacus:

This method is much like using anabacus. The abacus does not use the subtraction rule; 4 (IV) is represented by four beads (like IIII). An abacus uses the same principle of changing a bead in one column into two "5" beads in the column to the right, then changing a "5" bead into five "1" beads in the same column. When it comes tomultiplication, forget it! The Egyptians had a similar kind of numeral system (without the subtraction principle), and they used a completely different method of multiplication from ours -- one that works something like the way computers do it. You can find information onEgyptian multiplicationby searching our Dr. Math Web site.

We’ll be taking that advice soon! But first we’ll try other ways.

Here is what the FAQ shows for addition:

When they wanted to do complicated arithmetic problems, the Romans used a specialcounting boardor anabacus. A Roman counting board looked something like this: Counters, such as pebbles, were placed in each column. The columns were often grooved, so that the pebbles wouldn't roll away. A pebble in thebottom halfof the board meant one, ten, one hundred, or one thousand, depending on its placement. A pebble in thetop halfhad its value multiplied by five. For example, here's 2564 (MMDLXIV) on the counting board. Note thatthe Romans didn't worry about the subtraction principle unless they were actually writing their numbers down (and not always then). There were many variations on the Roman counting board, such as extra columns for larger numbers. People in theMiddle Agesturned the columns the other way and drew lines down the middle, so that the board could hold several numbers at once. This board was designed for flat counters. Counters were placedon the linefor I, X, C, and M, andbetween linesfor V, L, and D. The little x on the M line is like the modern comma in 1,000: it helps you remember the meaning of each numeral's position. Let's use this board toadd 23 and 58.The first step is to place the two numbers (XXIII and LVIII) on the board: Next, we slide the counters together: We can replace five of the counters on the I line by a counter in the V space, and the two counters in the V space by another counter on the X line, for a final answer of LXXXI.

It’s not hard to see how to follow the same procedure on the Roman version of the board, or on the abacus (which has attached beads or buttons rather than pebbles or counters). Here is a picture, from last time, of a replica abacus:

Here’s a question from a teacher in 2001, which is not about Roman numerals at all, but will lead to them:

Place Value and Roman Numerals What is the importance ofplace valuein relation to the value of a number? For example, where the number "2" is located in a number changes its valuation in the number, sohow does one explain the importance of the place valuein our number system? I have explored the realm of whole numbers, prime numbers, decimals, fractions, and mixed numbers, but cannot find anywhere to date, a cogent and readily ascertainable description or explanation of this phenomenon. Please explain so that I can address this problem with my class, as I cannot seem to describe this relation.

I answered:

Hi, James. The way to demonstrate the importance of place value is todemonstrate how to multiply two Roman numerals. You will quickly find that you have to learn the same facts in several different guises. For example, rather than just learning that 3 x 4 = 13, you have to learn III x IV = XII XXX x XV = CXX CCC x CD = MCC Because Roman numerals have no place value,when you do the same thing in different digits, you have to say it different ways. With place value, you can continue this with numbers as large as you wish, justreusing the same ten symbols; Roman numerals stop at the thousands, because they would need two new symbols for every new decimal place you tack on.

So the Romans, if they tried to multiply as we do, would have needed to memorize huge multiplication tables. That’s why they considered the very idea ridiculous.

The FAQ. after showing how to add on the Roman counting board, shows how to use it to multiply.

Here's an example of multiplication. We'll multiply 116 by 32 (CXVI times XXXII). This board has space for three numbers, so we can keep track of partial results. We have to break the multiplication into steps. 32 = 30 + 2, so we're going to start bymultiplying by 30. The first step is to make a copy of the bigger number, 116, on the other side of the board. Next wemultiply our copy by 10, because 30 = 10*3. We can do this by sliding the counters up one full line. Now we have tomultiply by 3: just triple the number of counters in the copy. Wesimplifyour result, andremove the three X countersfrom the 32 section, to show that we have multiplied by 30. The next step is tomultiply 116 by 2. We can do this by doubling the counters for 116 on the left-hand side of the board. We simplify again, andremove the 2 I counters, because we have finished multiplying by 2. As soon as wepush the counters together, andsimplifyone last time, we are done. Our answer is MMMDCCXII, or 3712. This can be checked using a calculator, or by hand.

I imagine you still prefer the modern way, whether on paper or with a calculator!

As mentioned before, the Romans likely also used the Egyptian method, which is the subject of this 1996 question:

Egyptian Method of Multiplication Have you ever heard of an Egyptian Method of Multiplication?

Doctor Jodi answered:

Hi Carrie! Yes, we have.... Here's a description of the method, from http://www-groups.dcs.st-andrews.ac.uk/~history/HistTopics/Babylonian_and_Egyptian.html Unlike the Greeks, who thought abstractly about mathematical ideas, the Egyptians were only concerned with practical arithmetic. In fact the Egyptians probably did not think of numbers as abstract quantities, but always thought of a specific collection of 8 objects when 8 was mentioned. To overcome the deficiencies of their system of numerals the Egyptians devised cunning ways around the fact that their numbers were unsuitable for multiplication, as is shown in the Rhind papyrus which date from about 1700 BC.

All of this is true of the Romans, too!

The Rhind papyrus recommends that multiplication be done in the following way. Assume that we want tomultiply 41 by 59. Take 59 and add it to itself, then add the answer to itself and continue: 41 59 ______________ 1 59 2 118 4 236 8 472 16 944 32 1888 ______________ Since 64 > 41, there is no need to go beyond the 32 entry. Now go through a number of subtractions: 41 -32= 9, 9 -8= 1, 1 -1= 0 to see that 41 =32+8+1.

All the arithmetic you need is doubling, in each column, and then subtraction to find which doubles add up to our multiplier!

Next check the numbers in the righthand column corresponding to 32, 8, 1 and add them. 59 ______________1 59X 2 118 4 2368 472X 16 94432 1888X ______________ 2419 Notice that the multiplication is achieved with only additions; notice also that this is a very early use ofbinary arithmetic.

We found that \(41=32+8+1\), so $$59\cdot41=59(32+8+1)\\=59\cdot32+59\cdot8+59\cdot\\=1888+472+59=2419$$

How is this binary? In splitting 41 into 32 + 8 + 1, we are effectively writing it as the binary number 101001. We’ll look at this more next week, in examining a related method in more depth.

It doesn’t matter which number you put on each column, though one way may involve more additions:

Reversing the factors we have 59 41 ______________1 41X2 82X 4 168 328X16 656X32 1312X _______________ 2419 Enjoy! Let us know if you need more help.

Here, \(58=32+16+8+2+1\), so $$41\cdot59=41(32+16+8+2+1)\\=41\cdot32+41\cdot16+41\cdot8+41\cdot2+41\cdot1\\=1312+656+328+82+41=2419$$

A 1998 question is about Egyptian division, which the Romans would also have used:

Egyptian Division My fourth graders are studying ancient Egypt. We have tried working with Egyptian multiplication but haven't found any reference to division. What format did the ancient Egyptians use for dividing? Can you help?

Doctor Mateo answered:

Hello Anne, Egyptian division is basicallyEgyptian multiplication in reverse. The divisor is repeatedly doubled to give the dividend. For example, 153 divided by 9. powers of two divisor and successive doubling (doubling) 2^0 = 1 9 2^1 = 2 18 2^2 = 4 36 2^3 = 8 72 2^4 = 16 144 2^5 = 32 288 288>153 so you can stop.Look for the combination of numbers that add up to 153in the divisor column. This can be like a puzzle for the students and an excellent way to teach the problem-solving method known as guess and check. The combination that works here is 144 + 9 since 144 + 9 = 153.

Here, rather than find which rows in the **first** column add up to the **multiplier**, we are finding which rows in the **second** column add up to the **dividend**.

To determine the divisor, look at the corresponding column of powers of two. Here we have: 2^4 corresponding with 144 and 2^0 corresponding with 9. So the divisor is 2^4 + 2^0 = 16 + 1 = 17.

Here, we found that the dividend is $$153=144+9=9\cdot16+9\cdot1=9(16+1)=9\cdot17,$$ so \(153\div9=17\).

What if we didn’t get an exact sum to 153?

The complication with Egyptian division comes with remainders. For example, 17 divided by 3. Powers of two column divisor doubling column 2^0 = 1 3 2^1 = 2 6 2^2 = 4 12 2^3 = 8 24 24>17 so we can stop. Looking at the combinations of 3, 6, and 12 we see that 12 + 3 = 15 is the closest we can seem to get without going over 17.

I’m going to skip the rest, because the Egyptians had their own way to handle fractions, which is irrelevant to Roman numerals. I imagine they would find a way to use uncia (twelfths).

We’ll close with a 2004 question on division:

Division with Roman Numerals I'm having real problems trying to divide using a non-place system. I can think in the Hindu place system, but the concept of using letters rather than numbers is confusing me. Can you please give me an example of a division calculation?

I answered:

Hi, Trish. There is a demonstration of addition and subtraction in our FAQ: Roman numerals http://mathforum.org/dr.math/faq/faq.roman.html That also points out that the Romans and others probably wouldn't have bothered doing arithmetic with their numerals in the sense we think of doing it; they would put the number on anabacusand use that. TheEgyptiansdid multiplication and division using a method of doubling and halving, which is described here: Egyptian Method of Multiplication http://mathforum.org/library/drmath/view/57542.html Egyptian Division http://mathforum.org/library/drmath/view/57574.html Egyptian Division, Shelley Walsh http://faculty.ed.umuc.edu/~swalsh/Math%20Articles/EgyptDivide.html That method didn't require memorizing multiplication tables (which would have to be different from one place to another). Actually, the idea that one could memorize such tables was considered preposterous, at least for most people, long after Hindu-Arabic numerals became popular.

(The reference to halving is to a variant of the method, called Russian Peasant Multiplication; we’ll look into that next time.)

Each operation can in principle be done in any numeral system by thinking in terms of place value. For example, toaddCIV and CCXCII, you would arrange each number in columns (as on an abacus, or in a place-value system) and add the columns: C IV + CC XC II ---------- CCC XC VI I can see why division would be the hardest to do this way; estimating a quotient requires some sort of familiarity with multiplication tables. Probably anything reasonable you do will amount to either translating (bit by bit) intoHindu-Arabic, or thinking in terms of anabacus, or using theEgyptianmethod. We could try to demonstrate several such techniques, but you get the idea:if you want to do as the Romans did, you won't bother!

But what would it look like to use the Egyptian method for division, **using Roman numerals**? Let’s have fun!

I'll just try the Egyptian division using Roman numerals, since you asked for a sample. Here is 153 / 9, as in the link above: I | IX * C L III X VI II | XVIII -C XL IV + I IV | XXXVI --------- ------ VIII | LXXII IX X VII XVI | CXLIV* I doubled starting at I and at IX until the next doubling would take me past CLIII; then subtracted each number in the second column that I could, marking those I used with *; then added the corresponding numbers in the first column to get my answer, XVII.

I have doubled repeatedly, starting with 1 and 9, then subtracted repeatedly from 153 to find that \(153=144+9\), just as in the example above. The doubling, and the subtractions, would undoubtedly be done on an abacus. I also suspect that they would skip the subtractive notation, and just write IIII and CXXXXIIII rather than IV and CXLIV. That takes less thought.

]]>

We’ll start with this 1997 question:

Roman Numerals I am trying to learn how to read Roman Numerals. What does MCMLXXXVI mean?

Doctor Rob answered, starting with writing::

Dear Robert - First you need to know thevalues of the letters, and then you need to know what thepositionsof the letters mean. Values: I 1 V 5 X 10 L 50 C 100 D 500 M 1000 _ V 5000

Each symbol represents the same value no matter where it is placed (unlike our system, where the symbol “1” can mean one, ten, a hundred, etc.); but there are rules about how to place them.

When a letter isrepeatedone, two, or three times,addup the value that many times. XXX = 10 + 10 + 10 = 30. MM = 2000. V, L, and D cannot be repeated. I, X, C, and M can be repeated up to 3 times.

So the basic idea is to **add** values; but …

If you want to repeat a letter 4 times, instead use that letter preceding one of the two next larger values: For4, don't use IIII, but insteadIV(I subtracted from V). For9, don't use VIIII or VIV, but insteadIX(I subtracted from X). Similar rules apply for 40, 90, 400, 900.

These would be XL (10 less than 50 = 40), XC (10 less than 100 = 90), CD (100 less than 500 = 400), and CM (100 less than 1000 = 900). This subtractive scheme reduces the number of symbols from four to two.

Write the resulting groups indescending order. 794 = 500 + 200 + 90 + 4 = D + CC + XC + IV = DCCXCIV Note that this only works up to 3999. For larger numbers, more letters would have to be assigned the values of 5000, 10000, and so on.

We’ll see more about larger numbers later.

Toreada numeral, reverse this process. Start at the left, and read off groups which either consist of repetitions of a single letter, or one of the groups IV, IX, XL, XC, CD, CM (representing 4, 9, 40, 90, 400, or 900, respectively). You can recognize when these groups occur because the letters arenot in descending order. Add up the values of those groups. MCMLXXXVI = M + CM + L + XXX + V + I = 1000 + 900 + 50 + 30 + 5 + 1 = 1986

*Reading* becomes easier when you have experience *writing* Roman numerals.

This 1999 question gives a slightly different perspective:

Converting from Hindu-Arabic Numerals to Roman Numerals I need to know the conversion for Roman numerals from our number system. I have searched in my local library, and on the Internet, but I can't seem to find an answer.

If we want to write, say, 2024 in Roman numerals, what do we do? Doctor Rick provided a straightforward procedure:

Hi, Debbie, Here is how I convert ourHindu-Arabic numeralstoRoman numerals. Convertone digit at a time. Each digit is converted the same way, except that the symbols are different: 1s digit: I for 1, V for 5 10s digit: X for 1 (10), L for 5 (50) 100s digit: C for 1 (100), D for 5 (500) 1000s digit: M for 1 (1000), nothing for 5 (5000) -- see below.

Hindu-Arabic numerals (our ordinary numbers, which came from India by way of Arabs) are built around place value. The Romans instead had two symbols for what we think of as “the ones” (\(\text{I}=1\), \(\text{V}=5\)), two for “the tens”, and so on.

These are the conversions for each digit: 1000s 100s 10s 1s -------------------------------------------------- 0 = nothing 0 = nothing 0 = nothing 0 = nothing 1 = M 1 = C 1 = X 1 = I 2 = MM 2 = CC 2 = XX 2 = II 3 = MMM 3 = CCC 3 = XXX 3 = III 4 = MMMM 4 = CD 4 = XL 4 = IV 5 = nothing 5 = D 5 = L 5 = V 6 = nothing 6 = DC 6 = LX 6 = VI 7 = nothing 7 = DCC 7 = LXX 7 = VII 8 = nothing 8 = DCCC 8 = LXXX 8 = VIII 9 = nothing 9 = CM 9 = XC 9 = IX Put the symbols from each digit together in the same order as they were in our (Hindu-Arabic) numeral system -- from left to right, largest to smallest.

These can largely be memorized, or you can learn the additive and subtractive patterns that produce them.

For example, let's convert 1999: 1000 => M 900 => CM 90 => XC 9 => IX ------- MCMXCIX

For my example of 2024, we think: “2000 = MM; 20 = XX; 4 = IV; so MMXXIV”. (Note that the 0 is ignored.)

We do the same sort of thing in *reading* Roman numerals, in reverse. Given \(\text{MCMXCIX}\), we pull the letters apart each time the symbol value goes down a place (e.g. from \(\text{M}=1000\) to \(\text{C}=100\)), to get \(\text{M,CM,XC,IX}\), and then translate each segment using the table above, or the addition and subtraction principles.

Notice that the system I just described only goes up to 4,999. Actually, sometimes you will see a symbol with abarover it to represent1000 timesthe usual value of a symbol. Thus, _ _ _ V = 5000 X = 10,000 L = 50,000 ... But I think it's unconventional to have symbols with bars to the right of symbols without, or to intersperse them, as in _ _ _ 4000 = MV 9000 = VMV Instead, the barred numbers should be a solid block to the left of the unbarred numbers. You can write: _________ 3,859,429 = MMMDCCCLV MMMMCDXXIX

Again, more on this later!

Here’s more detail on subtraction, from 1999:

Subtracting Roman Numerals What are the rules for the "subtraction components" in writing Roman Numerals?

When do you use subtraction in writing a number? This is not always made clear.

Doctor Rick answered again:

These are the limits on the use of the subtraction method, according to the modern rules of Roman numerals: I. You canonly subtract I, X, C, etc. (powers of 10; not V, L, or D). II. You canonly subtract a single letterfrom a single numeral (no IIX or IXX). III. What you are subtracting cannot be any smaller than 1/10 of what you are subtracting it from. You canonly subtract I from V or X, and X from L or C (MIM is not allowed).

So you only subtract a “1” from a “5”; and only in one pair of symbols; and only within a “place”. Putting it negatively,

- You can’t subtract
**a 5**(e.g. \(\text{LC}=100-50\) is invalid – that’s written as L alone). - You can’t subtract
**two 1’s**(e.g. \(\text{XXL}=50-20=30\) is invalid – that’s written as \(\text{XXX}\)). - You can’t subtract
**from a doubled letter**(e.g. \(\text{XCC}=200-10=190\) is invalid – that’s written as \(\text{CXC}\)). - You can’t subtract
**across places**(e.g. \(\text{XD}=500-10=490\) is invalid – that’s written as \(\text{CDXC}\)).

These restrictions prevent there being two ways to write the same number, and also prevent ambiguities in reading, like whether \(\text{XIIX}\) would mean \(10+8=18\), or \(11+9=20\), or \(12+10=22\).

Here is a positive way to look at it. When converting an Arabic numeral to a Roman numeral, convert itone digit at a time. Write each piece as a Roman numeral, then stick them all together left to right. For instance, 1999 = 1000 + 900 + 90 + 9 = M + CM + XC + IX = MCMXCIX

This is the point of his table in the previous answer.

There was a strong temptation in 1999 to write the year as \(2000-1=\text{IMM}\). But that was wrong.

The rules make for longer numerals sometimes than you might make by breaking the rules. But they make numerals easier to read, because you can read the Arabic digits off in small groups (M CM XC IX) and the subtractions are easy. Even in ancient Rome, though the place-value system of Arabic numerals (and the zero necessary to make it work) had not been invented,people already thought in terms of decimal groups, the way an abacus works: each digit made up of a collection of ones and fives.

We’ll see this more later: Roman numerals are, to a great extent, representations of an abacus – *except* for the subtraction rule, which came late in their history. In fact, where we have said “5’s and 1’s”, they might think of “upper and lower parts” of an abacus.

In ancient times and even in the Renaissance, the rules were not very strict (any more than spelling rules were!). You could find examples that violate each of my rules. See these interesting sites: Roman Numerals: History and Use http://www.deadline.demon.co.uk/roman/intro.htm Roman Numeral Date Conversion Guide http://www2.inetdirect.net/~charta/Roman_numerals.html

Both links are now redirected to archived copies. The second of these has this to say:

In actual practice,

neither ancient nor modern usage of Roman numerals has conformed rigidly to hard and fast rules. Even the subtraction principle, perhaps the most conspicuous feature of Roman numerals as we know them today, was applied only sporadically by the Roman themselves. Indeed, the appearance of a smaller numeral before a larger one in both ancient and medieval sources will often signifymultiplication rather than subtraction. For example, VM for 5,000 or VIIC for 700 (also written as V.M and VIII.C, or with M and C as superscripts).Any number of other variant or alternative forms may also be found, especially in the imprint dates of books from earlier centuries. These forms include the use of the long versions of the numbers 400 (

CCCC) or 40 (XXXX) — these were actually the preferred forms in ancient times and still appear in 20th-century books — as well asXXCfor LXXX,ICfor XCIX, VIX for XVI, or IIXX for XVIII, to mention only a few of the more obvious variant patterns.

But we will ignore these variations, and follow the rules as they were eventually (more or less) standardized!

Excel, for some reason, has options in its ROMAN conversion function with varying levels of “conciseness” (rule-breaking): For 3999, for example, it gives, successively:

- MMMCMXCIX following the rules;
- MMMLMVLIV allowing subtraction of 5’s (e.g. LM = 1000 – 50 = 950)
- MMMXMIX allowing subtraction skipping a “place” (e.g. XM = 1000 – 10 = 900)
- MMMVMIV allowing subtraction of 5’s skipping a “place” (e.g. VM = 1000 – 5 = 995)
- MMMIM allowing maximal subtraction (e.g. IM = 1000 – 1 = 999)

All of this, to my knowledge, is some programmer’s invention, and is never really used.

Here is a question from 2015 inspired by the prominence of tens:

When in Rome, Know Your Place — Less a Written Notation for It I apologize as this may not be strictly a math-related question, but yours was the most welcoming site for such a question. In your FAQ on Roman numerals, the second section asserts, "Here are the official rules for subtracting letters...." http://mathforum.org/dr.math/faq/faq.roman.html These rules -- on yours and various other sites -- readily explain howa value can be subtracted only from a neighboring value which is ten times the previous. For example, XLIX is 49; but IL shouldn't be used. I understand the operation of this rule, but not its development or history. It seems to me that the Roman numeral system didn't have nor take into account such things as a "ones place," a "tens place," and so on. I would think these would be traits exclusive to the Arabic numeral system. Sohow did it come to bethat there are hard and fast rules about subtracting I from V or X, but not from L, C, D, or M?Can you show why such a system would have referred to "ten times" a value in a rule about subtraction?

I answered:

Hi, Ken. The key to the question is thatthe Romans did indeed have a concept of place value. They just didn't take the step of using it in writing their numbers! The place where they used the idea was on theabacus. If you think of Roman numerals as a way to write down results obtained on their version of the abacus (or counting board), you can see that in factthey did write one "place" at a time. If they had thought of using the same set of symbols for each place, they would have had a positional notation like ours; but that would have required a way to represent zero, and they had not reached that stage of development.

Here is a replica of such an abacus:

The “fives” are at the top, and the “ones” are at the bottom, with each column a “place”. The columns are labeled I, X, C, and higher (using an early notation we’ll see below), and also for fractions (which are not tenths, but twelfths: “uncia”, from which we get “ounce” and “inch” – more on this below, too). For more on this device, see Roman Counting Instruments, or click on the picture. (The original manufacturer’s instruction manual has been lost, so we are not sure how they handled fractions!)

But the result is that when we write Roman numerals,we can always break them apartinto pieces that correspond to our digits, as in MCMLXXIX = M CM LXX IX \ /\ /\ /\ / 1 9 7 9 Each chunk tellswhat is in a columnon the abacus. Crossing over between places would result in a number, like IL, thatcould not easily be represented on the abacus.

So their abacus is a perfectly good place value system; they just didn’t take the final step. But it does explain why they wrote numbers as they did.

So, is it really base 10? Here’s a question from 2004:

The Base of Roman Numerals What base does the Roman Numeral system use?It appears to have two bases.

I answered, starting with a link to a previous answer of mine:

Hi, Emma. This question is discussed here: Base of Roman Numerals http://mathforum.org/library/drmath/view/52587.html The system isessentially base 10, since a numeral can always be broken into parts for each power of ten: M CM LX VII 1 9 6 7 It can be described as a combination of bases 2 and 5, since the values of the symbols involved are either 2 or 5 times the value of the previous symbol: I V X L C D M 1 5 10 50 100 500 1000 *5 *2 *5 *2 *5 *2

Of course, these are not **place** values, but **symbol** values! There’s a big difference.

But that doesn't really make it base 2 or base 5, and sinceit is not a place-value system, the role of 2 and 5 is not very significant. No powers of 2 or 5 are involved, only powers of 10 times 1 or 5. That's why I prefer to think of it as amodified base-10 system influenced by base 5.

It is sometimes called “Bi-quinary coded decimal“, which is a very apt description.

It's interesting, though, that theabacus(which IS a place-value system) uses the same trick of splitting each decimal digit into two parts, one base 2 (two beads representing fives, only one of which is actually needed) and one base 5 (five beads representing ones). Roman numerals, apart from subtractive notation (as in IV for 4), represent well the state of such an abacus, with the digits corresponding to each power of ten showing how many 1's and how many 5's there are in that "digit".

Note that the Roman abacus, as shown above, used only one 5-bead, and four 1-beads (actually sliding buttons). I described the Chinese abacus.

Here’s a question from 1997, introducing how bigger numbers are, and were, written:

Large Roman Numerals Could you please convert 5000, 1,000,000, and 5,000,000 into Roman numerals?

Doctor Cheryl answered:

I had to look for a while to find the information I remembered about how to write really large numbers in Roman numerals. According to an old 1960 mathematics textbook, the Roman numeral system worked like this: I = 1, V = 5, L = 50, C = 100 and M = 1000. If aheavy barwas placed over the numeral that meant it wasmultiplied by 1000. A V with a bar over it would stand for 5000. An M with a bar would be 1,000,000. How would YOU write your last numeral: 5,000,000?

Presumably, the answer would be \(\overline{\text{MMMMM}}\). But is that legal?

That was asked in 2006, and added to this page:

How do you write a decimal numberhigher than 4,000,000?I found it to be interesting that the rules state that you can only use a particular letter3 times max(if I'm not mistaken). Therefore if I useMMMM with a bar on topto signify 4,000,000, it would break the 3 letter max rule. The other way I can think of is to use two bars but would that be right?

Since other symbols besides M are limited to 3 (as we go from III to IV), does that also apply here, with no symbol for 5000 to subtract from?

I answered:

Hi, Arnold. The basic answer is that the Romans rarely bothered to write such large numbers, so they never developed a convenient way to do it, just several ad-hoc tricks to extend the system a bit. One is thebar(though I think that wasa later addition); another earlier trick was to put what we might call parentheses around a number tomultiply it by 1000. That is, something like (|) was an early form of M=1000, and they would use ((|)) for 1,000,000 and so on. I don't know whether using double bars is considered valid, either in the sense of having been used historically, or of being commonly accepted today. But it makes sense, and I know some people use it when needed.

This early form for 1000 typically looks like **C|Ↄ**, or **ↀ**. The symbol for 1,000,000 was **ↂ**. We saw these (and more) on the abacus above.

The 3-times ruleis not really a rule, andwas not followed by the Romans themselves, who often wrote IIII for 4. The "rule" just arises from the fact that, once the subtraction rule was developed,it was not _necessary_ to use more than 3 of anything. When you get up to MMMM, there is no alternative, so the rule does not really apply; it _is_ necessary to repeat four or more times, if you choose not to use the bar. So my answer to your question would be ____ MMMM = 4,000,000 and I wouldn't be bothered by == IV

According to some sources, IV was avoided because the name of the God Jupiter was written as **IV**PITER, and they wanted to avoid offending him. (That is probably not true.)

Here is one discussion of the details I've mentioned: Roman Numerals - How They Work http://www.web40571.clarahost.co.uk/roman/howtheywork.htm#larger Look higher on the page for some examples of how the Romans broke the modern "rules".

In that link, we see this inscription as an example of the “(|)” and “|)” symbols:

We think of this as \(\text{MDLXXXIII}=1583\).

We’ll close with this, from 1998:

Decimals and Roman Numerals Do you know of any method for representing Roman numerals in a floating point format? For example, does 10.5 = X.V?

“Floating point” is used in computers to represent non-whole numbers, and works like scientific notation; Richard is really asking about **decimals**.

Doctor Rick answered, focusing on what the Romans did:

Hello, Richard.The Romans didn't have a standard way of writing fractions(or decimals). Usually, they just wrote out the appropriate word, such as "tres septimae" for three-sevenths. When they needed to do serious calculations with fractions, the Romans used theuncia, a unit that meant 1/12 of anything. There were names and symbols for different multiples of the uncia. For example, six unciae, or 6/12, made up thesemis. The semis meant one-half, and its symbol was an S cut in half (this looks a lot like a backward 2.) Unfortunately, uncia symbols didn't follow any real system, and they were never entirely standardized. Jeff Miller's page on the "Earliest Uses of Symbols for Fractions and Decimals" has more information about the uncia: http://jeff560.tripod.com/fractions.html

There *were* symbols, to some extent; here is a list from Wikipedia:

It's important also to understand that Roman numerals arenot a place-value system; there is no ones place, tens place, etc., so there isno "place" for a decimal point. If I were to invent a system for writing fractional quantities in Roman numerals, other than writing a fraction with Roman numerals in the numerator and denominator, I would take a cue from the method, occasionally seen, of writing a horizontal bar over a Roman numeral to signify multiplication by 1000: _ M = 1000 * 1000 = 1,000,000 and use, say, a bar under a Roman numeral to signify division by 1000.

This is purely his invention. But it can be fun to extend ancient ideas. Imagine finding an inscription with this number:

$$\overline{\text{CXXIII}}\;\text{CDLVI}\;\underline{\text{DCCLXXXIX}}$$

That would, in Doctor Rick’s imaginary world, mean 123,456.789. But this would be utterly foreign to Caesar!

]]>Having just discussed the Chain Rule and the Product and Quotient Rules, a recent question about implicit differentiation (which we covered in depth two years ago) fits in nicely. This raises an important issue: when you get an apparently wrong answer, you may just have done something wise that caused your (correct) answer to differ from the book’s! We’ll also look at a three-year-old question as a further illustration of how these work.

This question came from Amia in mid-December:

Hi doctors,

I’m trying to solve the question attached:

I had a different answer than the answer in the book which is y/2x.

I can’t simplify my answer to be the same as the book.

Is there any mistake in my solution?

In implicit differentiation, we are given an equation that implicitly defines a function (or several functions), and differentiate both sides of the function with respect to *x*, using the chain rule; then we solve for the desired derivative, \(\frac{dy}{dx}\).

What he has done is first to clear fractions by multiplying both sides by \(xy^2\), $$\frac{x}{y^2}+\frac{y^2}{x}=5\\\frac{x}{y^2}xy^2+\frac{y^2}{x}xy^2=5xy^2\\x^2+y^4=5xy^2$$

This simplifies the equation considerably, eliminating the need for the quotient rule. But we’ll see that it also complicates things …

Then he differentiated each term, using the power rule, $$\left(x^2\right)’=2x,$$ and the chain rule, $$\left(y^4\right)’=4y^3y’,$$ and the product rule, $$\left(5x\cdot y^2\right)’=\left(5x\right)’\left(y^2\right)+5x\left(y^2\right)’\\=5y^2+5x\left(2yy’\right)=5y^2+10xyy’.$$

Then he collected terms containing \(y’\), factored, and divided: $$2x+4y^3y’=5y^2+10xyy’\\4y^3y’-10xyy’=5y^2-2x\\(4y^3-10xy)y’=5y^2-2x\\y’=\frac{5y^2-2x}{4y^3-10xy}.$$

We could try to simplify by factoring, but it clearly isn’t equivalent to the book’s answer: $$y’=\frac{5y^2-2x}{4y^3-10xy}=\frac{5y^2-2x}{2y(2y^2-5x)}.$$

I answered:

Hi, Amia.

This is not unusual. When I help a student with implicit differentiation, I generally tell them to

do that and nothing else; any simplification you make before differentiating will change the result so it will not look like the book’s answer — even though what you do is an obviously good thing to so.So answer #1 is, just

differentiate as given, and you will get their answer.

We saw the same phenomenon in A Surprising Route to a Differential Equation, where (under Alternative 4) I said

**Implicit differentiation can result in very different looking formulas** when you process the problem differently before differentiating. This is because the implicit derivative by nature doesn’t explicitly take into account the fact that *x* and *y* are related; if we were to solve for *y* and plug that into the different expressions we get for the derivative, to get a formula in terms of *x* (rather than both *x* and *y*), we would find that they agree. [This is a danger in assigning implicit differentiation problems, because anything special that you do before differentiating, no matter how wise, may result in your answer being different from what the teacher expects!]

Clearing fractions is a typical good-idea-gone-wrong that I see students make; it is “wrong” only in that it will give *different* results than a teacher expects, not necessarily that those results are always *more complicated* or *less useful* than what we get directly.

So let’s differentiate the equation as given:

$$\frac{x}{y^2}+\frac{y^2}{x}=5$$

Combining into a single fraction, $$\frac{x^2+y^4}{xy^2}=5$$

By the quotient rule (and all the other rules!), $$\frac{\left(2x+4y^3y’\right)\left(xy^2\right)-\left(x^2+y^4\right)\left(y^2+x\left(2yy’\right)\right)}{\left(xy^2\right)^2}=0\\\frac{\left(2x^2y^2+4xy^5y’\right)-\left(x^2y^2+2x^3yy’+y^6+2xy^5y’\right)}{x^2y^4}=0\\\frac{x^2y^2-2x^3yy’-y^6+2xy^5y’}{x^2y^4}=0$$

Solving for \(y’\), $$x^2y^2-2x^3yy’-y^6+2xy^5y’=0\\x^2y^2-y^6=(2x^3y-2xy^5)y’\\y’=\frac{x^2y^2-y^6}{2x^3y-2xy^5}=\frac{y^2(x^2-y^4)}{2xy(x^2-y^4)}=\frac{y^2}{2xy}=\frac{y}{2x}$$

And that’s the book’s answer.

Answer #2 is,

your result is correct; that,together with the original equation, isequivalentto y/(2x). (This is important: Your expression can’t be simplifiedby itselfto theirs; they are only the same for (x, y)on the given curve.)

Sometimes it is possible to use the given equation to rewrite our “bad” answer and get a nicer form (that might even match the book). I didn’t try this at the time, but I see some interesting features of our original solution that make it worth doing.

We have the equation \(x^2+y^4=5xy^2\) and the derivative \(\frac{5y^2-2x}{2y(2y^2-5x)}\). Both the numerator and the denominator contain terms that look close to \(5xy^2\); what if we multiply the numerator by \(x\) and the denominator by \(y^2\), and compensate by multiplying the whole by\(\frac{y^2}{x}\)?

$$y’=\frac{\left(5y^2-2x\right)x}{2y\left(2y^2-5x\right)y^2}\cdot\frac{y^2}{x}\\=\frac{5xy^2-2x^2}{2y\left(2y^4-5xy^2\right)}\cdot\frac{y^2}{x}$$

Now we can replace \(5xy^2\) with \(x^2+y^4\), since these are the same for any point on the curve, and see what we get:

$$y’=\frac{5xy^2-2x^2}{2y\left(2y^4-5xy^2\right)}\cdot\frac{y^2}{x}\\=\frac{\left(x^2+y^4\right)-2x^2}{2y\left(2y^4-\left(x^2+y^4\right)\right)}\cdot\frac{y^2}{x}\\=\frac{y^4-x^2}{2y\left(y^4-x^2\right)}\cdot\frac{y^2}{x}\\=\frac{1}{2y}\cdot\frac{y^2}{x}=\frac{y}{2x}$$

We did it! we’ve shown that *at any point on the curve*, the two answers are equivalent.

But we can’t always expect to see such a path to a desired answer; we were very lucky. In the absence of that discovery, I suggested a more generally useful approach when your answer doesn’t match the book:

Now, how can you show that? You want to find a way to

use the original equation to obtain theirs from yours, which is typically difficult. What you can do instead is toset your derivative equal to theirs, and “solve”. You’ll find at some point that you will have an equation related to (perhaps equivalent to) the given equation. And if you wanted to, you could reverse this work in order to obtain their answer from yours, or vice versa. (I’m not sure whether this always works, but it does in my experience, when I’ve tried. I successfully did it here.)

This idea is similar to proving an identity in trigonometry.

So here’s the work:

We want to show that $$\frac{5y^2-2x}{2y(2y^2-5x)}=\frac{y}{2x}$$

We can clear fractions (or, equivalently, cross-multiply), assuming both denominators are non-zero, and then “solve”:

$$\frac{5y^2-2x}{2y(2y^2-5x)}=\frac{y}{2x}\\2x\left(5y^2-2x\right)=y\cdot2y\left(2y^2-5x\right)\\10xy^2-4x^2=4y^4-10xy^2\\20xy^2=4x^2+4y^4\\5xy^2=x^2+y^4$$

But we already know the last line is equivalent to the given equation; so when the equation is true, the two claimed derivatives are equal. And if you read upward from the last line, you actually derive the equivalence of the derivatives from the equation.

In principle, we might also find the derivative **explicitly**, though this only works when you can solve for one variable:

Alternatively, the given equation can be

solved for y(up to a point), and you could probably show that both answers are equivalent to the explicit derivative. (I haven’t even tried this, which would be hard, and is not usually an option.)

I said “up to a point” because we actually get multiple values for *y*; this isn’t a function. But it turns out that this still works.

Here’s how:

We want to solve \(x^2+y^4=5xy^2\) for *y*, which we can do with a little thought:

$$y^4-5xy^2-x^2=0\\y^2=\frac{5x\pm\sqrt{\left(-5x\right)^2+4x^2}}{2}=\frac{5x\pm\sqrt{29x^2}}{2}=\frac{5\pm\sqrt{29}}{2}x$$

Hmmm … that’s a parabola! No, two parabolas, with different parameters. (I’ll graph them in a moment.) Do you recall what I said at the start, that an implicit equation “defines a function (or several functions)”? This is what I was talking about. Even if we solved for *x* as a function of *y*, we’d have two of them.

Furthermore, we can guess that if we change the 5 to another number, we’d just get another parabola of the form \(y^2=Ax\), or, if you prefer, \(x=\frac{1}{A}y^2\). And what is the derivative of such a parabola? $$\frac{d}{dx}\sqrt{Ax}=\frac{d}{dx}\left(Ax\right)^{1/2}=\frac{1}{2}\left(Ax\right)^{-1/2}\cdot A=\frac{A}{2\sqrt{Ax}}=\frac{\sqrt{Ax}}{2x}.$$

And this is \(\frac{y}{2x}\)! So, yes, we get the same result – and not just on our curve, but on any of this **family** of curves with different values of *A*.

In preparing for publication (and before solving as I just did above), I tried graphing the equation on Desmos, to see if that would give any insight. (This is something you can do even if you can’t solve explicitly for *y* – software can do amazing things!) I was surprised to see the two parabolas:

What we have determined so far is that at every point on this curve, \(y’=\frac{y}{2x}\); we could check this at a few points if we doubted our answer. For example, it approximately passes through \((2,3.1)\); there \(\frac{y}{2x}=\frac{3.1}{4}=0.775\), which looks about right. Here is that line, which appears to be tangent:

But what about other points **not on the curve**? They will lie on other members of the **family** of curves given by $$\frac{x}{y^2}+\frac{y^2}{x}=k,$$ where we allow the constant on the right-hand side to vary. Here is part of that family:

The purple broken curve represents \(k=2\); the other curves are for \(k={\color{Blue}3},{\color{Green}5},{\color{DarkOrange}7},{\color{Red}9}\). (You can imagine what might happen for larger or negative values of *k*. It does.) At any point on one of these lines, the derivative of that curve is given by \(y’=\frac{y}{2x}\).

Our original form for the derivative works only on our original curve, because the 5 is part of it. Note: In clearing fractions, the constant that in the straightforward approach differentiates to zero (and so disappears), gets baked into the formula, making the formula work only on that particular member of the family.

For example, at the point \((4,3.25)\), it gives \(\frac{5y^2-2x}{2y(2y^2-5x)}=\frac{5(3.25)^2-2(4)}{2(3.25)(2(3.25)^2-5(4))}=\frac{44.8125}{7.3125}\approx6.128\), which is clearly wrong:

But the other formula gives \(\frac{y}{2x}=\frac{3.25}{2\cdot4}=0.40625\), which looks about right:

So was our original derivative *wrong*, in the sense that it doesn’t work for the whole family? No, it just represents a *different family*. Here are members of the family \(x^2+y^4=5xy^2+k\), for \(k=-100\) to \(100\). We see that the line we got for \((4,3.25)\) looks right this time:

(Our point fits \(k\approx-84\), and the blue curve near it is for \(k=-80\).)

Looking for other questions about implicit differentiation, I found this from Lance in early 2021:

Hi,

I am given a formula 0 = 2x^2 + xy – y^2 – 6x + 3y

I have to

a) use implicit differentiation to find the derivative

b) evaluate dy/dx at the point (2,4)

c) determine the equation of the tangent to the curve at the point (2,4) expressing my answer in the form y=mx + b

When I got to (c) my answer was y=2x which looked a little straightforward.

I have attached my workings below. I just wanted to check what I have done is correct and that I did not miss anything.

Is his equation for the tangent line, \(y=2x\), too simple to be correct? No. We’ll see that it’s exactly as simple as it should be.

Here is his work for part (i):

He used the chain rule and the product rule to differentiate:

$$2x^2+xy-y^2-6x+3y=0$$

$$\left(2x^2\right)’+\left(xy\right)’-\left(y^2\right)’-\left(6x\right)’+\left(3y\right)’=(0)’\\\left(4x\right)+\left(x’y+xy’\right)-\left(2yy’\right)-\left(6\right)+\left(3y’\right)=0\\4x+y+xy’-2yy’-6+3y’=0\\4x+y-6+\left(x-2y+3\right)y’=0\\y’=-\frac{4x+y-6}{x-2y+3}$$

Looks good. There was no reason to modify the equation before differentiation, so we don’t have that issue here.

Here is the rest of the work:

He calculated \(y’\) at \((2,4)\) twice to double-check it, and found (using my form) that $$y’=-\frac{4x+y-6}{x-2y+3}=-\frac{4(2)+(4)-6}{(2)-2(4)+3}=-\frac{6}{-3}=2$$

So for the tangent, we want the equation for a line through \((2,4)\) with slope \(m=2\). He did this, too, two ways. On the left, he uses the **point-slope form** \(y-y_1=m\left(x-x_1\right)\) for a line with slope *m* through point \((x_1,y_1)\): $$y-y_1=m\left(x-x_1\right)\\y-4=2\left(x-2\right)=2x-4\\y=2x$$

On the right, he uses **slope-intercept form**, which I know by the American form, \(y=mx+b\); he substitutes the coordinates for the known point, and solves for the unknown *y*-intercept. In my terms, this is $$y=mx+b\\4=2(2)+b\\4=4+b\\b=0$$ so the equation is, again, \(y=2x\).

Everything is correct, and doubly so. All that’s needed is some encouragement, it seems.

I answered, not needing to say much, but choosing to make a graph (as above) as a quick check:

Hi, Lance.

Yes, the work is all good.

Now, I put the equation into Desmos thinking to show you that your tangent line is correct, and got a little surprise. Here is the graph:

Now that’s interesting! The tangent line is actually part of the “curve” itself.

The equation had the form of a general conic section, but the graph is a special case.

We saw a similar surprise in the last section of Implicit Differentiation: Explanation, Examples, and a Surprise, which also involved two lines.

It turns out that the graph of the equation is not an ellipse or hyperbola as I expected at first glance, but a

degenerate conic sectionconsisting of two lines, so your tangent at (2, 4) is actually part of the graph itself! But it is clearly the right “tangent”.

We discussed degenerate conic sections in Degenerate Conics I: Mystery of the Missing Case; this is the case of a pair of intersecting lines. Since the graph consists of two lines, each of those lines is itself a tangent.

So it turns out that the equation can be factored as (2x – y)(x + y – 3) = 0.

This factored form is not immediately obvious from the general form \(2x^2+xy-y^2-6x+3y=0\); I probably just read it off from the graph, writing each lines equation in standard form \(ax+by+c=0\), multiplying the left-hand sides, and checking that the product was correct. There are other tricks I could have used, one being to rotate the graph to eliminate the *xy* term, and then complete the squares.

So the two lines are \(2x-y=0\) and \(x+y-3=0\), which become $$y=2x\\x+y=3$$ The first of these is our tangent line.

And your derivative y’ = (6 – 4x – y)/(3 + x – 2y) agrees: When y = 2x, it simplifies to 2 (except when x = 1, when it is 0/0), and when y = 3 – x, it simplifies to -1 (except when x = 1).

So it’s a more interesting problem than either of us realized!

Here I used the equations for the separate lines to show that his formula for the derivative simplifies to the known slope of each line, on that line. When we replace *y* with \(2x\), we get $$y’=-\frac{4x+y-6}{x-2y+3}=-\frac{4x+2x-6}{x-2(2x)+3}\\=-\frac{6x-6}{-3x+3}=-\frac{6(x-1)}{-3(x-1)}=2$$ and when we replace *y* with \(3-x\), we get $$y’=-\frac{4x+y-6}{x-2y+3}=-\frac{4x+(3-x)-6}{x-2(3-x)+3}\\=-\frac{3x-3}{3x-3}=-1$$

In both cases, we canceled \((x-1)\), which is why the derivative is undefined (or, rather, indeterminate, at \(x=1\)): It actually has *both* slopes at that point, where the lines intersect.

As before, the equation can be modified, by changing right-hand side to any constant (which doesn’t affect the derivative), making the equation $$2x^2+xy-y^2-6x+3y=k$$ Here is the graph of members of this family for \(k={\color{Red}{-4}},{\color{DarkOrange}{-2}},{\color{DarkGreen}0},{\color{Blue}2},{\color{Purple}4}\):

This is a family of hyperbolas (rotated from the usual orientation).

And again, our derivative formula applies to any point in the plane, giving the slope of the curve through that point. For example, at \((2,2)\), the slope is $$-\frac{4x+y-6}{x-2y+3}=-\frac{4(2)+(2)-6}{(2)-2(2)+3}=-\frac{4}{1}=-4$$ which looks about right:

Is there any way to change this formula into something simpler? I don’t know.

If we graph the equation as $$z=2x^2+xy-y^2-6x+3y$$ in three dimensions (using Geogebra), we get a **hyperbolic paraboloid**, horizontal slices of which are the curves in this family. Here are three of the curves shown above (\({\color{DarkOrange}{z=-2}},{\color{Green}{z=0}},{\color{Blue}{z=2}}\)):

Our original equation gives the pair of lines in green, which pass through the “saddle point” of the surface.

Our implicit derivative equation gives tangents to these cross-sectional curves, which are the intersections of tangent planes to the surface with the cutting planes. There’s a lot more we could say, but I’ll stop there!

However, if you’re interested in the graph of the first problem, here it is, with cross sections at \({\color{Blue}{z=3}},{\color{Green}{z=5}}\):

The graph is inaccurate near the *z*-axis, where there is more happening than the software can handle.

As taught in, say, Stewart’s Calculus, the product rule is $$\frac{d}{dx}\left(f(x)g(x)\right)=f(x)\frac{d}{dx}[g(x)]+g(x)\frac{d}{dx}[f(x)],$$

and the quotient rule is $$\frac{d}{dx}\left[\frac{f(x)}{g(x)}\right]=\frac{g(x)\frac{d}{dx}[f(x)]-f(x)\frac{d}{dx}[g(x)]}{[g(x)]^2}.$$

The latter, especially, is a little complicated. Can we make it easier to remember and to use?

We’ll start with this question from 2002:

Quotient Function MnemonicI get really turned aroundwhen trying to factor or expand derivatives of quotient functions. Help me to differentiate the following: x/((x^2)-4)

Presumably, Ryan is getting the order of terms in the numerator wrong, which is a common issue.

Doctor Nbrooke answered:

Good evening, Ryan, and thanks for writing to Dr. Math. This is a very easy mistake to make, and one that I made quite a few times myself when I was in Calculus. Remember that the derivative of the quotient function has the form (f/g)' = [g*f' - f*g']/[g^2]

That is, $$\left(\frac{f}{g}\right)’=\frac{g\cdot f’-f\cdot g’}{g^2}$$ or $$\frac{d}{dx}\left(\frac{f(x)}{g(x)}\right)=\frac{g(x)\cdot\frac{df(x)}{dx}-f(x)\cdot\frac{dg(x)}{dx}}{g(x)^2}.$$

It’s even worse in words:

The derivative of the quotient of two functions

equals the denominator multiplied by the derivative of the numerator,

minus the numerator multiplied by the derivative of the denominator,

all divided by the denominator squared.

How can you remember what to differentiate where? Since it is a subtraction, the order matters!

However, it is much easier to rememberLOW DEE HIGHOVERHIGH DEE LOWOVER LOW SQUARED whereLOW is the denominatorof the quotient andHIGH is the numerator. So we'll organize our functions, using your example: LOW x^2 - 4 DEE LOW 2x HIGH x DEE HIGH 1 So we get (LOW DEE HIGH - HIGH DEE LOW)/(LOW SQUARED) = ((x^2-4) - x*2x) /(x^2-4)^2 = (x^2 - 4 - 2x^2) / (x^2-4)^2 = (-x^2 - 4) / (x^2 - 4)^2 And there we go. I hope this mnemonic device works for you.

I’ve heard many students recite this mnemonic, which is taught in their classes. Here is how a textbook we use (Briggs) puts it: $$\frac{LoD(Hi)-HiD(Lo)}{(Lo)^2}$$

I personally have trouble memorizing things like this, even when it is tied to a relatively familiar phrase (“hi-de-ho!”) – does it *start* with that, or *end* with that? I think it only works if you recite it often. So I like a couple other tricks.

Beyond the mnemonic, notice how he organizes the work, first writing each function and its derivative, and then putting them in place. I’ll do that again below.

Although I couldn’t find this in any old answers, what I myself do is to **change the order of factors** just a little, and make the **product rule** consistent with that, in order to strengthen the hold of the **pattern** on my mind: I find meaningful, or at least consistent, patterns far more memorable and useful than arbitrary memorization.

My version of the quotient rule is $$\left(\frac{f}{g}\right)’=\frac{f’\cdot g-f\cdot g’}{g^2}.$$

What I’ve done is to swap the order in the first term, so that rather than multiply a function times a derivative in each term, each term has *f* followed by *g* (top, then bottom), and we differentiate the first first, and the second second. So everything is first-then-second: derivative of top times bottom, minus top times derivative of bottom.

But I do more than that, by connecting it to the **product rule**. There, order doesn’t matter, since we’re adding, rather than subtracting; it’s just the sum of each factor times the derivative of the other. Since this order is *arbitrary*, I choose to remember that rule in a form that agrees with the *required* order for the quotient rule: $$\left(fg\right)’=f’\cdot g+f\cdot g’.$$

So I **always** differentiate the first (top or left) function first and the second (bottom or right) function second, and always keep the functions in the same order.

Now, I just checked several recent calculus books on my shelf, and found that they all give the quotient rule in the order we see in the questions and answers here, namely $$\left(\frac{f}{g}\right)’=\frac{g\cdot f’-f\cdot g’}{g^2},$$ while giving the product rule either in my form or as $$\left(fg\right)’=f\cdot g’+g\cdot f’$$ (or both). This latter form is why I used to get tangled up: The product rule differentiated the second function first, while the quotient rule differentiates the first function first. My formulation maintains consistency, which my mind likes a lot!

One book I have (Larson) gives my version of the product rule as an alternative, where the functions are kept in the same order, for a different reason. They comment that this is easier to extend to more than two factors: $$\left(fgh\right)’=f’\cdot g\cdot h+f\cdot g’\cdot h+f\cdot g\cdot h’.$$ (All three functions stay in the same order, and we differentiate each in turn.) My version of the quotient rule thus ties in with what may be the better form of the product rule anyway.

Furthermore, I see that Wikipedia (at least currently) writes both the product rule and the quotient rule my way. So I may be onto something. Also, the 1998 Foerster book (*Calculus: Concepts and Applications*) that was mentioned last week (which I have on a more distant shelf) agrees with me. It expresses the formulas as $$\left(uv\right)’=u’\cdot v+u\cdot v’$$ and $$\left(\frac{u}{v}\right)’=\frac{u’\cdot v-u\cdot v’}{v^2}.$$

Let’s use this formulation for the example in the previous section, \(f(x)=\frac{x}{x^2-4}\):

$$u=x,\;\;u’=1\\v=x^2-4,\;\;\;\;v’=2x$$ $$f'(x)=\frac{u’\cdot v-u\cdot v’}{(x^2-4)^2}\\=\frac{1\cdot(x^2-4)-x\cdot(2x)}{(x^2-4)^2}\\=\frac{x^2-4-2x^2}{(x^2-4)^2}=\frac{-x^2}{(x^2-4)^2}$$

Next, from 1995:

Deriving the Quotient Rule Hi! I'm a Swedish student at Chalmers and my math problem is: how do Iprovethis derivation? (f/g)' = gf'-fg'/ g^2

This is a request for a proof, not for a method, but I’m going to take it further. (Note that this student likely used the word “derivation” meaning “differentiation”, but in fact we are going to *derive* the rule – these words confuse a lot of students!)

Doctor Ken answered:

Hello! You know, I don't think this is a dumb question at all! This stuff can be really tough and scary the first time you see it. I think the easiest way to do this derivation is towrite f/g as f*g^(-1). Then you can use the product rule and the power rule to get the answer. In fact, since the quotient rule (what you're trying to derive) is kind of hard to remember,I usually end up re-deriving this every time I want to use it or teach it to someone.Good luck!

We’ll see the actual proof in a moment, but let’s do it for a specific quotient first, which is what I often do when a quotient can be easily rewritten as a product. For example, suppose we have to differentiate \(h(x)=\frac{\sin(x)}{x}\). If we use the quotient rule, we get

$$\frac{d}{dx}\left(\frac{\sin(x)}{x}\right)=\frac{\frac{d}{dx}\sin(x)\cdot x-\sin(x)\cdot\frac{d}{dx}x}{x^2}=\frac{x\cos(x)-\sin(x)}{x^2}.$$

If we either aren’t sure of the order, we can do it this way:

$$\frac{d}{dx}\left(\frac{\sin(x)}{x}\right)=\frac{d}{dx}\left(\sin(x)\cdot x^{-1}\right)\\=\frac{d}{dx}\sin(x)\cdot x^{-1}+\sin(x)\cdot\frac{d}{dx}x^{-1}\\=\cos(x)\cdot x^{-1}+\sin(x)\cdot-x^{-2}\\=\frac{\cos(x)}{x}-\frac{\sin(x)}{x^2}.$$

This is equivalent to the first answer.

It’s even easier when the function to be differentiated is, say, \(h(x)=\frac{5}{x}\), so that the top function is just a constant, and even the product rule is not needed. I often see students reflexively use the quotient rule, when they just have to do this: $$h'(x)=\frac{d}{dx}\left(5x^{-1}\right)=5\cdot-x^{-2}=\frac{-5}{x^2}$$

Now let’s use the same approach to prove the formula. This is from 2001:

Chain Rule: Prove the Quotient I've been asked toprovethe quotient where u/v is uv^-1using the product ruleand thechain rulefor (dv^-1/dx I get this far: v^-1*(du/dx)+u(dv^-1/dx) but I don't know how to use the chain rule to to find dv^-1/dx and then convert it all into the quotient rule. Could you help by giving me a step-by-step guide to the answer? Thank you!

I answered, making the (correct) work shown a little more readable, and continuing:

Hi, David. You applied theproduct ruleto u * v^-1 and got d/dx (u * v^-1) = v^-1 * du/dx + u * d(v^-1)/dx Now you have to apply thechain ruleto find d(v^-1)/dx. You might find this clearer if you define a new variable w = v^-1 The chain rule says dw/dx = dw/dv * dv/dx Since dw/dv = -v^-2, this gives d(v^-1)/dx = -v^-2 * dv/dx You should be able to finish the work. If not, write back and show again how far you got.

Using a temporary variable can often help with complicated chain rule problems, as we saw last time, though it is not necessary.

Here’s the actual proof (using my preferred order):

$$\frac{d}{dx}\left(\frac{u}{v}\right)=\frac{d}{dx}\left(uv^{-1}\right)\\=\frac{du}{dx}\cdot v^{-1}+u\frac{d}{dx}\left(v^{-1}\right)\\=\frac{du}{dx}\cdot v^{-1}+u\left(-v^{-2}\right)\frac{dv}{dx}\\=\frac{\frac{du}{dx}}{v}-\frac{u\frac{dv}{dx}}{v^2}\\=\frac{\frac{du}{dx}v}{v^2}-\frac{u\frac{dv}{dx}}{v^2}\\=\frac{\frac{du}{dx}v-u\frac{dv}{dx}}{v^2}$$

Now let’s apply the rule, in this question from 1998:

When to Use the Chain and Quotient Rules Hi, I'm doing a maths subject called "Introductory Calculus" and learning about the various forms of differentiation. I've come across such questions as: Differentiate: (2x + 1)^2 f(x) = ---------- 2x + 4 In situations like this, do I use thequotient rule firstor thechain rule first?

Doctor Mateo answered:

Hello Matthew, In the specific example that you give, you begin by recognizing that the function is a rational function - it looks like afractionwith a variable in the denominator (the bottom). Because you are looking for the first derivative of a rational function, you begin by applying thequotient rulefirst. As you apply the quotient rule, you encounter a term in the numerator with a power. So what happens here is that you end up applying thechain rulein the process of doing the quotient rule.

The basic principle, as we saw last time, is that you can work “from the outside in”, differentiating the last operation in the expression (the “outside function”), and using whatever rules are needed as you come to them. The new thing is that this applies not only to functions, but to operations, which are essentially functions of two variables. We could picture the problem like this, using the boxes from last time:

$$\require{AMSmath}f(x)=\boxed{\frac{\boxed{\left(\boxed{2x+1}\right)^2}}{\boxed{2x+4}}}$$

Since the last operation performed in evaluating this is a division, you first apply the rule for division.

He gave two examples involving different orders, the first looking like the given problem, and the second being different, in order to demonstrate the idea more thoroughly (and also leave Matthew to do his own homework!).

Consider the two functions below: (4x - 5)^3 ( (4x - 5) )^3(I)g(x) = ----------- and(II)h(x) = (----------) (5 + 3x) ( (5 + 3x) ) In (I) above, the exponent is part of the term in the numerator (top). The exponent does not go with the term in the denominator. In (II) above, the exponent is on the outside of the entire fraction, which means that it actually belongs to the numerator (top) and the denominator (bottom) because: ( (4x - 5) )^3 (4x - 5)^3 h(x) = (----------) = ---------- ( (5 + 3x) ) (5 + 3x)^3

So in I, the last operation performed is the division, while in II, the last operation is the exponentiation, so we start with that.

In(II)you would apply thechain rule first, since the exponent goes with the numerator and the denominator. Then you would do thequotient ruleon the rational functioninside. Alternatively, you could put the exponent with the numerator and the denominator then apply the quotient rule first, since you now have a rational function, but then you would have to use the chain rule twice.

That is, we *could* choose to rewrite it as he showed, though we don’t actually do that.

Here’s the work for II, with the exponent on the outside:

$$h(x)=\boxed{\left(\boxed{\frac{\boxed{4x-5}}{\boxed{5-3x}}}\right)^3}$$

$$h'(x)=3\left(\,\boxed{\frac{\boxed{4x-5}}{\boxed{5-3x}}}\,\right)^2\cdot\boxed{\frac{\boxed{4x-5}}{\boxed{5-3x}}}\,’\\

=3\left(\,\boxed{\frac{\boxed{4x-5}}{\boxed{5-3x}}}\,\right)^2\cdot\frac{\boxed{4x-5}\,’\cdot\boxed{5-3x}-\boxed{4x-5}\cdot\boxed{5-3x}\,’}{\left(\boxed{5-3x}\right)^2}\\

=3\left(\frac{4x-5}{5-3x}\right)^2\cdot\frac{4\left(5-3x\right)-\left(4x-5\right)\cdot-3}{\left(5-3x\right)^2}\\

=3\frac{\left(4x-5\right)^2}{\left(5-3x\right)^2}\cdot\frac{5}{\left(5-3x\right)^2}\\

=\frac{15\left(4x-5\right)^2}{\left(5-3x\right)^4}$$

In practice, I would work out the derivatives of the numerator and the denominator off to the side, rather than as part of the whole expression, and start writing at the third line of the work.

In(I)you would apply thequotient rule firstsince you have a rational function, and then do the chain rule on the part that involves the exponent. I hope this helps you distinguish the order a little better. Have fun differentiating!

Here’s the work for I, with the exponent inside the numerator:

$$g(x)=\boxed{\frac{\boxed{\left(\boxed{4x-5}\right)^3}}{\boxed{5-3x}}}$$

$$g'(x)=\frac{\boxed{\left(\,\boxed{4x-5}\,\right)^3}\,’\cdot\left(\boxed{5-3x}\right)-\boxed{\left(\,\boxed{4x-5}\,\right)^3}\,\cdot\boxed{5-3x}\,’}{\left(\,\boxed{5-3x}\,\right)^2}\\

=\frac{3\left(\,\boxed{4x-5}\,\right)^2\cdot\boxed{4x-5}\,’\cdot\left(5-3x\right)-\left(\,\boxed{4x-5}\,\right)^3\cdot\left(-3\right)}{\left(\,\boxed{5-3x}\,\right)^2}\\

=\frac{3\left(4x-5\right)^2\cdot4\cdot\left(5-3x\right)+3\left(4x-5\right)^3}{\left(5-3x\right)^2}\\=\frac{3\left(4x-5\right)^2\left(20-12x+4x-5\right)}{\left(5-3x\right)^2}\\=\frac{3\left(4x-5\right)^2\left(15-8x\right)}{\left(5-3x\right)^2}$$

Here again, the boxes represent where I would be focusing as I wrote each part of the work, not something I would actually write. Some instructors tell you to stop before simplifying, so they can grade a problem just on the actual calculus; because many students would miss the idea of factoring in order to simplify (as I almost did here). But it’s very useful practice!

We can close by looking inside the product rule, from 2002:

Proof of Product and Quotient Rules I would like to knowhow the product rule and the quotient rule came aboutso I can betterunderstandcalculus. I have been to the math tutors and they don't know. Thank you, Theron

I answered by starting with a proof:

Hi, Theron. Let's try using thedefinition of the derivative, and see what happens: Suppose we have two differentiable functions f and g, and we have defined a new function p(x) = f(x) * g(x). Its derivative will be p(x+h) - p(x) f(x+h)g(x+h) - f(x)g(x) p'(x) = lim ------------- = lim ----------------------- h->0 h h->0 h

Here we’re applying the definition of the derivative to *p*, and then using the definition of *p* to write it as a product.

I'd like to get this to includesomething that looks like the definition of f'; I'll try adding and subtracting an intermediate term:_______________________ / \f(x+h)g(x+h)- f(x)g(x+h) + f(x)g(x+h)- f(x)g(x) = lim ------------------------------------------------- h->0 h That lets me factor something out of each pair: [f(x+h) - f(x)]g(x+h) - f(x)[g(x+h) - g(x)] = lim ------------------------------------------- h->0 h f(x+h) - f(x) g(x+h) - g(x) = lim [------------- g(x+h) - f(x) -------------] h->0 h h A little magic with the limits (which has to be proven valid), and we get f(x+h) - f(x) g(x+h) - g(x) = lim ------------- * lim g(x+h) - lim f(x) * lim ------------- h h = f'(x) g(x) - f(x) g'(x) We've got it!

It’s not really magic, just a theorem that the limit of a product is equal to the product of the limits, if they exist.

Theron asked not primarily for a proof, but for **understanding**. Can we get that from the proof?

Now let's look at what this means. Suppose thelengthof a rectangle is varying with time according to a function f(t), and thewidthis g(t). During a small time interval from t to t+dt, how has the area f(t)g(t) changed? g(t) g'(t) dt +--------------+---+ | | | | | | | | |f(t) | | | | | | +--------------+---+ | | |f'(t) dt +--------------+---+ As shown, we can approximate the change in f(t) by f'(t) dt, and the change in g(t) by g'(t) dt. The change in the area fg consists of the two long rectangles and the little square: fg'dt + f'gdt + f'g'dt^2 Divide this by dt, and we have fg' + f'g + f'g'dt

The small lengths in the picture are differentials, which approximate the actual changes in \(f(t)\) and \(g(t)\). I changed the independent variable to *t* to avoid any sense that *x* might be the horizontal variable.

Since dt is very small, we can ignore the contribution from the tiny square, and the derivative is fg' + f'g. That's not a formal proof, but it gives a feel for what is happening. Stare at the picture for a while, and you'll see why each derivative is multiplied by the other function, and why there are two terms added together. +--------------+---+ | | | | | | | fg |f'g| | | | | | | +--------------+---+ | fg' | +--------------+

This is actually where the product rule came from – before we had formal concepts of limits, differentials and hand-waving were standard in calculus.

To get the quotient rule, just apply the product rule to q(x) = f(x) [1/g(x)] using the chain rule to find the derivative of 1/g(x) = g(x)^-1.

That’s the derivation we saw above.

]]>

We’ll start with a question from 1997:

Calculus Chain Rule I can't understand the chain rule. Every time I ask someone to explain it they use y's and u's, etc... could yougive me the chain rule in easy terms, like how to do it, not just give me a formula like y=(U)^2? Thanks. Stu

A full statement of the chain rule tends to need lots of letters and tangled expressions. The short form Stu probably saw looks like $$\frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}$$ This says that if *y* is a function of *u*, and *u* is a function of *x*, then the derivative of the **composite function** *y* with respect to *x* is the product of the derivative of *y* with respect to *u*, and the derivative of *u* with respect to *x*.

What this says is simple, and almost intuitive. For example, suppose your altitude, *h*, is a function of the distance, *s*, along a road, and your distance along the road is a function of time, *t*. The road has a certain slope at any time, which is the rate of change of altitude with respect to distance, \(\frac{dh}{ds}\) meters up per meter forward; and your car has a certain speed, which is the rate of change of distance with respect to time, \(\frac{ds}{dt}\) meters per second. How fast are you going up? Every second you are moving \(\frac{ds}{dt}\) meters forward, and therefore \(\frac{dh}{ds}\cdot\frac{ds}{dt}\) meters up. That’s the derivative of the composite function \(h(t)\).

But often we don’t have a “*u*” in the problem, or any other variable – just one big expression like \(\cos(\tan(5x-3))\). Then what?

Doctor Scott answered:

Hi Stu! Good question. I was just skimming an excellent Calculus book written by Paul Foerster where this very question was addressed. His suggestion was that you shouldthink of the chain rule as a process rather than a rulewith a lot of du/dx and dy/dx's. So, here goes....

It’s like using muscle memory rather than written instructions.

Remember that the chain rule is used to find the derivative of *compositionsof functions* - that is,functions that have functions inside of them. For example, the functionsin(x^2)can be thought of as a composition of two other functions, sin x and x^2, with the x^2 being INSIDE the sin function. Similarly, the function(x^2 - 5x + 8)^(1/2)is also a composition of two other functions, (x^2 - 5x + 8) and x^(1/2), with the first function being INSIDE the second. One more example? The functioncos(tan(5x-3))is the composition ofthreefunctions, 5x - 3 inside of tan x, inside of cos x. So the chain rule gets applied when there is some function INSIDE of another function.

We’ll be working all three of these examples.

We traditionally represent composite functions as boxes connected by “plumbing” (or, if you prefer, “links in a chain”, the reason for the term “chain rule”):

In particular, for our first example, we might think of it this way:

This sort of diagram has to be read backward; the first function named in the expression is the last one in line. I like to think of it this way instead, which reads more naturally:

Each function/fish “eats” the one in front of it, producing “meat” that is eaten by the one behind. Then, we have fish inside fish:

We can see this in the expression: $$\require{AMSmath}y=\boxed{\sin\left(\,\boxed{x^2\strut}\,\right)}$$

And we’ll apply the chain rule by following the “food chain” from the outside in.

The stuff that people have been telling you probably goes something like this: Ify = sin(x^2), then we can write this function as the composition ofy = sin uandu = x^2. (Again, notice that the x^2 is INSIDE of the sin function.) Then, dy/dx = dy/du * du/dx. So, we have dy/dx = cos u * 2x; but u = x^2, so we have dy/dx = 2x cos(x^2).

Here, *u* is just a temporary name we’re giving to the result of the inside function, like this:

This approach works fine when we are given named variables (as we’ll see later), but it gets in the way for problems like ours, where the function is written as one big expression.

But we don’t need names; we can just *do it*:

How about another way? Let's think of the chain rule as a process. The derivative of a composite function is the DERIVATIVE OF THEOUTSIDE FUNCTIONTIMES the DERIVATIVE OF THEINSIDE FUNCTION. In practice, here's how it works. Considery = sin(x^2). The outside function is a sine function; its derivative is cosine, so we have (so far) cos(x^2). Now, INSIDE the sine function is x^2. Its derivative is 2x, so now we have 2x cos(x^2). Notice that there is no other function "inside" the x^2, so we are done.

The key idea is that we have to keep the same thing inside the derivative that was inside the function itself. I like to think of it like this, putting a box (or at least imagining it) around the inside function and thinking of it as a single entity (as if it were a variable):

$$y=\boxed{\sin\left(\,\boxed{x^2\strut}\,\right)}$$

$$y’=\cos\left(\,\boxed{x^2\strut}\,\right)\cdot\boxed{x^2\strut}\,’=\cos\left(\,\boxed{x^2\strut}\,\right)\cdot2x$$

To differentiate “sine of something”, we multiply “cosine of something” by the derivative of “something”.

Let's look at a couple more examples:y = (x^2 - 5x + 8)^(1/2). The OUTSIDE FUNCTION is basically a power rule problem, so we have 0.5(x^2 - 5x + 8)^(-1/2) using the power rule. The INSIDE FUNCTION is x^2 - 5x + 8; its derivative is 2x - 5, so we have y' = (2x - 5)(.5)(x^2 - 5x + 8)^(-1/2).

Here we have

$$y=\boxed{\left(\,\boxed{x^2-5x+8\strut}\,\right)^{1/2}}$$

$$y’=\frac{1}{2}\left(\,\boxed{x^2-5x+8\strut}\,\right)^{-1/2}\cdot\boxed{x^2-5x+8\strut}\,’\\=\frac{1}{2}\left(\,\boxed{x^2-5x+8\strut}\,\right)^{-1/2}\cdot(2x-5)$$

This could also have been written as $$y=\boxed{\sqrt{\,\boxed{x^2-5x+8\strut}\,}\,}$$

$$y’=\frac{1}{2\sqrt{\,\boxed{x^2-5x+8\strut}\,}\,}\cdot\,\boxed{x^2-5x+8\strut}\,’\\=\frac{2x-5}{2\sqrt{\,\boxed{x^2-5x+8\strut}\,}\,}$$

I generally rewrite radicals as fractional powers, rather than memorize separate formulas for radicals.

y = cos(tan(5x-3)). The outermost function is a cosine, so its derivative is negative sine: -sin(tan(5x-3)). Inside the cosine is a tan function; its derivative is sec^2, so we now have sec^2 (5x-3) * (-sin(tan(5x-3)) Finally, inside of the tan function is 5x-3; its derivative is 5. So, FINALLY, we have 5 * sec^2 (5x-3) * (-sin(tan(5x-3)) Or, simplifying, we get y' = -5 sec^2 (5x-3) sin(tan(5x-3))

This has three layers (outside, middle, and inside):

$$y=\boxed{\cos\left(\,\boxed{\tan\left(\,\boxed{5x-3\strut}\,\right)}\,\right)}\\\\y’=-\sin\left(\,\boxed{\tan\left(\,\boxed{5x-3\strut}\,\right)}\,\right)\cdot\boxed{\tan\left(\,\boxed{5x-3\strut}\,\right)}\,’\\

=-\sin\left(\,\boxed{\tan\left(\,\boxed{5x-3\strut}\,\right)}\,\right)\cdot\sec^2\left(\,\boxed{5x-3\strut}\,\right)\cdot\boxed{5x-3\strut}\,’\\=-\sin\left(\,\boxed{\tan\left(\,\boxed{5x-3\strut}\,\right)}\,\right)\cdot\sec^2\left(\,\boxed{5x-3\strut}\,\right)\cdot5$$

So, it helps a lot to think of the chain rule as: The derivative of the outside TIMES the derivative of what's inside!

Consider this question from 1999, using another notation, which is technically more precise, but even more confusing to read:

Chain Rule Notation I'm trying to figure out these questions: Formula : (f◦g)'(x)= g'(x) · f'[g(x)] 1) f(x) = 2x+6 g(x) = 3x-4 (f◦g)'(x) = 3 · 2 = 6 I know that g'(x)= 3 but how about f'[g(x)]? How does 2 come about? I don't understand how it's done. 2) g(x) = 2x^2 + 5 h(x) = x^4 (g◦h)'(x) = 4x^3 · 4x^4 = 16x^7 It's the same here. I know how to differentiate h(x) but I got stuck on g'[h(x)]. How does 4x^4 come by? Please help me, Thanks.

The notation in the question *means* exactly what we’ve been doing. I prefer to write it in this order: $$(f\circ g)'(x)=f'(g(x))\cdot g'(x).$$ This means that the derivative of a composite function \(h(x)=(f\circ g)(x)=f(g(x))\) is the derivative of the **outside** function, *f*, **applied to the inside function**, *g*, times the derivative of the **inside** function, *g*.

The way I like to think about it, using the idea we saw above, is

$$\require{AMSsymbols}\boxed{f\left(\square\right)}\,’=f\,’\left(\square\right)\cdot\square\,’.$$

In the examples here, we are given the two functions separately (but with the same variable *x*, rather than an intermediate variable *u*). In the first question, functions were written as a single composite expression; in that form, respectively, these would be \(2(3x-4)+6\) and \(2(x^4)^2+5\). In this form, the inside functions could be marked like this: $$2\left(\,\boxed{3x-4\strut}\,\right)+6$$ and $$2\left(\,\boxed{x^4\strut}\,\right)^2+5$$

Doctor Mitteldorf answered, recommending the *u* formulation we avoided above:

Dear Eric, The chain rule can be taught in such a way that it's quite transparent, or it can be madeutterly mysterious with bad notation. It looks as if you've been a victim of the latter. The chain rule is about taking thederivative of a function of a function. Instead of f being a function of x, we have f is a function of g, and g is a function of x. In this notation, the chain rule can be written: df/dx = df/dg · dg/dx It seemsalmost obviouswhen you write it that way. Just "cancel out" the dg's in the numerator and denominator.

In defense of the function notation form, that makes it explicit that the derivative of *f* is applied to \(g(x)\), not to *x*; and it emphasizes that the derivative is a new **function** *f* ‘, not a new **variable**. And this form is the most suitable for these problems, in which functions are named.

Doctor Mitteldorf here used *g* not only as a function name, but also as a variable representing its output. He did this, presumably, to avoid bringing in another variable (the *u* that was confusing above) to represent \(g(x)\). And many authors prefer to avoid the d notation precisely because it looks so “obvious”, as if you are just canceling in a fraction. (See What Derivative Notations Mean.) The latter notation is very useful as a *reminder* of this rule, but mathematicians are uncomfortable talking as if that is all there is to it. (See What Do dx and dy Mean?)

In your example (1), f(x) = 2x+6 g(x) = 3x-4 The teacher gave you a notation that's deliberately confusing. You have to remember that thex in these equations is a dummy variable. The top equation just says f is a function that takes its argument, multiplies it by 2, then adds 6. The x is there just as a placeholder. You can replace it with a or b or theta or phi and the equation says exactly the same thing.

This is important: The *x*‘s in the two function definitions will represent different numbers. So giving them different names helps to differentiate them (no pun intended).

But in this case, you want toreplace it with g: f(g) = 2g+6 Is it obvious why I want to replace the x by a g? It's because f◦g means "f composed with g," or "the function f taken of the function g."

Distinguishing the variables called *x* is essential.

Coming back now to problem 1, let's do it two ways. First, we'llactually find f*g and differentiate it. Second, we'll use the chain rule. Then we'll be in a position to check that the two answers are the same. First, f(g) = 2g+6 g(x) = 3x-4 Substituting the second equation into the first, you have f(x) = 2(3x-4)+6 = 6x-2 It's obvious, then, that f'(x) = 6.

This process of expanding the composite function can be time-consuming; the chain rule is usually a time-saver. The point here is that it is not a necessity! It gives the same result as direct differentiation.

Second, we'll use thechain rule: df/dx = df/dg · dg/dx = 2 · 3 = 6 Hence, we get the same answer both ways.

This was a particularly simple example, where both derivatives are constant.

Certain functions can make this harder. Here is a question from 1998:

Trigonometry and the Chain Rule I have three questions that have me stumped. I need to differentiate the following: y = 2 csc^3(sqrt(x)) y = x/2 - (sin(2x))/4 y = (1 - cos(x))/sin(x)

Doctor Santu answered, solving his own examples so Amanda could learn by doing her own homework:

Amanda: These all have to do with the Chain Rule. Here's the basic idea. Suppose:y = sin(x^3 + tan(x)). How do you find the derivative?Think of x^3 + tan x as a big BLOB. So we really need to find the derivative of: y = sin(BLOB) Well, the rule says that the derivative of sin(BLOB) is simplycos(BLOB) multiplied by the derivative of the BLOB itself. A word on notation: I'm going to write y' for the derivative of y (instead of dy/dx).

The BLOB idea is the same as my “something” or my boxes. I’ve been known to use the same word.

Now, in this case: y' = cos(x^3 + tan(x)) * (3x^2 + sec^2(x)) because BLOB is x^3 + tan(x), and the derivative of the BLOB is 3x^2 + sec^2(x).

Now we come to something important: When we write a power of a trig function by putting an exponent on the function name (which, as I explained here, is a notation left over from before general function notation was introduced, as is permission to omit parentheses), we hide the fact that **the power is the outside function**, and the trig the inside:

Let's try another example:y = sin^3(x^3 + tan x)This is really: y = [ sin(x^3 + tan x) ]^3Using our previous terminology, the derivative of (blob)^3 is simply: 3(blob)^2 * (the derivative of blob itself). In this case: y' = 3[sin(x^3 + tan(x))]^2 * (derivative of sin(x^3 + tan(x))) = 3[sin(x^3 + tan (x))]^2 * cos(x^3 + tan(x)) * (derivative of x^3 + tan(x)) = 3[sin(x^3 + tan(x))]^2 * cos(x^3 + tan(x)) * (3x^2 + sec^2(x))

By writing the exponent on the **outside**, we make it easier to see that the sine is the **inside** function.

In my formulation with boxes, this is $$y=\boxed{\,\left(\,\boxed{\sin\left(\,\boxed{x^3+\tan(x)}\,\right)}\,\right)^3}\\\\y’=3\left(\,\boxed{\sin\left(\,\boxed{x^3+\tan(x)}\,\right)}\,\right)^2\cdot\boxed{\sin\left(\,\boxed{x^3+\tan(x)}\,\right)}\,’\\

=3\left(\,\boxed{\sin\left(\,\boxed{x^3+\tan(x)}\,\right)}\,\right)^2\cdot\cos\left(\,\boxed{x^3+\tan(x)}\,\right)\cdot\boxed{x^3+\tan(x)}\,’\\=3\left(\,\boxed{\sin\left(\,\boxed{x^3+\tan(x)}\,\right)}\,\right)^2\cdot\cos\left(\,\boxed{x^3+\tan(x)}\,\right)\cdot\left(3x^2+\sec^2(x)\right)$$

Although the exponent is after the parentheses, we can see it clearly as on the outside, which wasn’t obvious originally.

In the chain rule, the basic idea is topeel the onion from the outside. You want to take the derivative of a function within a function within a function. You take the derivative of the outermost function relative to the stuff that's inside it, then multiply that by the derivative of the inside expression, relative to the expression inside the expression, and so on, all the way down to the tiniest little x all the way inside. (And some people even stick a "1" on at the end, because the derivative of an x is just 1. I think that's overdoing it a bit.) The Chain Rule needs quite a lot of imagination tosee these formulas as expressions within expressions, and ideally you should have a friend sit by you and point out how to "peel the onion" layer by layer.

I’ve skipped a couple examples; he closed with an example *five* layers deep:

One final example:y = sin(tan(sin^2(x^7 + 3x)))y' = ...? You must first take the derivative of sin (expression), relative to the expression that's inside. You multiply that by the derivative of the tan (inside expression). You multiply that by the derivative of [sin (x^7 + 3x)]^2, because sin^2 (x^7+3x) means [sin(x^7+3x)]^2. That, in turn, will contain the derivative of sin(x^7 + 3x), which in turn will contain the derivative of (x^7 + 3x), which is 7x^6 + 3. It's important to put the proper expression inside the various partial expressions. So: y' = cos(tan ...) * sec^2(sin ...) * 2[sin ...] * cos(x^7 + 3x) * (7x^6 + 3) You have to know what I have left out, and you must know how to put it in. I suggest you complete the derivative of that derivative just above, inserting all the expressions that would take the place of the ...s, then try the problems you're interested in. (All of us at Dr. Math had to practice these too.)

The function can be seen as $$y=\boxed{\sin\left(\,\boxed{\tan\left(\,\boxed{\left(\,\boxed{\sin\left(\,\boxed{x^7+3x\strut}\right)}\,\right)^2}\,\right)}\,\right)}$$

$$y’=\cos\left(\,\boxed{\tan\left(\,\boxed{\left(\,\boxed{\sin\left(\,\boxed{x^7+3x\strut}\right)}\,\right)^2}\,\right)}\,\right)\\\cdot\sec^2\left(\,\boxed{\left(\,\boxed{\sin\left(\,\boxed{x^7+3x\strut}\right)}\,\right)^2}\,\right)\\\cdot2\,\boxed{\sin\left(\,\boxed{x^7+3x\strut}\,\right)}\\\cdot\cos\left(\,\boxed{x^7+3x\strut}\,\right)\\\cdot\left(7x^6+3\strut\right)$$

We’ll close with this, from 2004:

Chain Rule Applied to Exponential Functions At a time t hours after it was administered, the concentration of a drug in the body isf(t) = 27 e^(-0.14t)ng/ml. What is the concentration 4 hours after it was administered? At what rate is the concentration changing at that time? I got lost finding the derivative of the problem to find the rate of change. Part 1. f(4) = 27 e^(-0.14(4)) = 27e^(-0.56) = 15.42 ng/ml Part 2. Chain rule =f'(g(x)) x g'(x)= 27'(e^-0.56) x (e^-0.56) I get lost after that. I don't know if I am going in the right direction and I think the derivative of 27 = 0, so the whole first half of the problem would equal 0.

This is not really a hard application of the chain rule, but the notation is a little awkward. (A major error is in replacing *t* with 4 before differentiating, so there is no variable left!)

Doctor Mike answered:

Hi Brendan, The derivative of a constant times a function is just that constant, times the derivative of the function. So, the derivative of 27 e^(-.14t) is 27 times the derivative of e^(-0.14t) . So, let's just concentrate on the derivative of e^(-0.14t) and you can put it all together later. OK?

We can think of the 27 as representing an outer function \(a(x)=27x\), whose derivative is 27; but it’s easier just to let it pass through the process.

People often have problems with the Chain Rule applied to exponentials, because of not being clear ofwhat is the "outside" function and what is the "inside" function. That's whyI like to use the notation exp(x)in place of the notatione^xwhen we do problems like this. Also, let's give a function name "h" to what is in your original exponent. That is, define it like h(t) = -0.14t . Then, e^(-0.14t) can be written as exp( h(t) ) which clearly shows that the exponential is the outside function, and what we have called "h" is the inside function.

So we have the function \(f(t)=\exp(h(t))\), where \(\exp(x)=e^x\) and \(h(t)=-0.14t\).

What do we do with this now? To use the Chain Rule you have to know how to differentiate both functions that are involved. The exponential function is its own derivative.exp'(t) = exp(t). You should have seen this already. For the other one,h'(t) = -0.14. That you should have seen a long time ago. Right? So, to use the Chain Rule on exp( h(t) ) you get exp'( h(t) ) * h'(t) which is exp( h(t) ) * (-0.14) . In this last expression, exp( h(t) ) is the derivative of the outside function evaluated at the inside function, and (-0.14) is the derivative of the inside function.

We don’t *have* to do this renaming, as long as we see the exponent as the inside function: $$f(t)=e^{\boxed{-0.14t}}$$ If you find the “exp” function helpful, use it: $$f(t)=\exp\left(\,{\boxed{-0.14t}}\,\right)$$

The general notation for differentiating f(t) = g( h(t) ) with the C.R. is simply f'(t) = g'( h(t) ) * g'(t) . If you spend some time to get your function expressed in that way, then the rest will be easier.

Carrying out the process, we have $$\boxed{27\boxed{e^{\boxed{-0.14t}}}}’\\=27\boxed{e^{\boxed{-0.14t}}}’\\=27e^{\boxed{-0.14t}}\cdot\boxed{-0.14t}’\\=27e^{\boxed{-0.14t}}\cdot(-0.14)$$

]]>