We’ll start with a 1998 question, which will open the door:

The Log of a Negative Number Hi, I ran into a revision problem with logs which wasn't the problem itself, just a gaping hole in my learning. They had given us what turned out to be a log of a quadratic, with a x-3 factor or something. I was checking my answers like the good student I was, but my calculator couldn't do log (-3).I guessed that the answer had to be complex. I consulted my Maths coordinator and he agreed with me and said that (for example), log -10 equaled i. I had a play around at home and devised that x^i = -x (where x is a real number) and a few other rules for complex numbers for powers. Am I right? I tried to look up several maths databases for verification, but they didn't have anything. I congratulate you all for all the helpful info and hard work you guys put into the site. It's great help.

As we’ll see, the teacher’s guess was wrong, as is Brett’s idea about \(x^i\). I’d like to see his work! But the question alone is a great one; most of us at this level would be satisfied just to say, as we did last time, that negative numbers are not in the domain of a log.

Doctor Barrus answered:

Hi, Brett! You're partially right. To start out with,the logarithm of a negative number in base 10 is complex. I ran into a problem in verifying what your coordinator said, though. Here's what I did, using the math software program Maple V (release 5): (In the following, I'll use log[a](b) to mean the logarithm with base a of b. For example, log[2](8) = 3, since 2^3 = 8.)

In typing, we also use the notation “log_a(b)” to indicate \(\log_a(b)\). We want to find \(\log_{10}(-10)\).

First I used the base-conversion rule of logarithms: log[a](b) = log[c](b) / log[c](a) to write:log[10](-10)= log[e](-10) / log[e](10) = ln(-10) / ln(10) (I did this because Maple can only work with complex logs when the logarithm base is e.)

The same is likely true for most humans! So far, we have that $$\log_{10}(-10)=\frac{\ln(-10)}{\ln(10)}$$

Next I found the value of ln(-10): ln(-10) = ln(10 * -1) = ln(10) + ln(-1) (Using the sum rule of logs)

This brings us a step closer to something we can calculate: $$\ln(-10)=\ln(10)+\ln(-1)$$

To deal with the ln(-1), I usedEuler's Identity: e^(i*x) = cos(x) + i*sin(x) In particular,e^(i*pi)= cos(pi) + i*sin(pi) = -1 + i*0 =-1(I'm not sure how much math you've learned. The Euler identity is something that you can derive using Taylor series, which you learn about in calculus. Also, if you haven't covered trigonometry yet, you'll have to take my word for it that when you're measuring angles in radians (not degrees), cos(pi) = -1 and sin(pi) = 0).

We met this in Euler’s Formula: Complex Numbers as Exponents. We need to turn this exponential expression into its logarithmic form.

Now, since e^(i*pi) = -1, we can write: -1 = e^(i*pi)) ln(-1) = ln(e^(i*pi)) ln(-1) = i*pi

This is the same idea as saying that \(\log_2(8)=3\) because \(8=2^3\).

So, back to our equation: ln(-10) = ln(10) + ln(-1) = ln(10) + (i*pi) = 2.303 + 3.1412*i (about) which is the numerical answer Maple gives.

We used \(\ln(-1)\) to find \(\ln(-10\), and now we use that to find \(\log_{10}(-10)\):

Now ln(10) is about 2.303, so (going back to our first formula): log[10](-10) = ln(-10)/ln(10) ln(10) + pi*i = --------------- ln(10) = 1 + [pi/ln(10)]*i = 1 + 1.364*i (about) Now, raising 10 to this power gives -10, but this is not equal to i. So log(-10) is not equal to i.

So the teacher was wrong in saying the answer was merely *i*. As we’ll see, Doctor Barrus skipped over a couple tricky details, too. The important thing here was to show how to think about such a question.

Using Maple, I calculated 10^i and got -0.668 + 0.744*i (approximately), which, as you can see, is not equal to -10. So your rule isn't quite correct, butyou're right about the logs of negative numbers being complex. Using the steps I used to find out log(-10), you can find out log[a](-a) for any a. You might give it a shot. Good luck!

Let’s take him up on that: $$\log_a(-a)=\frac{\ln(-a)}{\ln(a)}=\frac{\ln(a)+\ln(-1)}{\ln(a)}=\frac{\ln(a)}{\ln(a)}+\frac{\ln(-1)}{\ln(a)}=1+\frac{\pi}{\ln(a)}i$$

Observe that what happened here is that we are able to expand the **domain** of logarithms when we are willing to expand the **range** into the complex numbers! What happens when we expand the **domain** to complex numbers as well?

This 1996 question from Rod will take us into that new territory:

Log of Complex Number Does thelog of a complex numberhave any meaning? If so, are there log laws, and what are they?

Doctor Pete answered:

Yes, to both questions. Let's begin withcomplex exponentiation, if you are not already familiar with it. We define e^(i*t) = Cos[t] + i*Sin[t] where t is a real number and i is the imaginary unit. (Actually, this is not so much a definition as an extension of the exponential function on the reals, and can be "derived" in many ways.) Thus, e^(x+i*y) = e^x (Cos[y] + i*Sin[y]) = e^x*Cos[y] + i*e^x*Sin[y] and we have the formula for the exponential of a general complex number z = x + i*y expressed in Cartesian coordinates. Let's shorten this to e^z = w where z and w are both complex. Taking the logarithm of both sides, z = Log[w] Thus, if we let w = a + i*b, and we solve for x and y in terms of a and b, we will have a formula for the complex logarithm.

So, the **complex exponential** function is $$e^z=e^{x+iy}=e^x\cos(x)+ie^x\sin(y)=a+bi=w,$$ and the **complex (natural) logarithm** is the **inverse** function, where we express *z* as a function of *w*. We need, therefore, to solve for *x* and *y* as functions of *a* and *b*.

(Please note that he is using the convention, common at the level of calculus and above, that “log” refers to the *natural* logarithm, not the *common* (base 10) logarithm. I’ll stick with his convention until near the end.)

But note that a = e^x*Cos[y] [eq. 1] b = e^x*Sin[y] [eq. 2] by equating the real and imaginary parts. So we solve for x and y. To solve for x, square equations [1] and [2] and add them together. Then a^2 + b^2 = e^(2x)*Cos[y]^2 + e^(2x)*Sin[y]^2 = e^(2x)*(Cos[y]^2 + Sin[y]^2) = e^(2x) since Cos[y]^2 + Sin[y]^2 = 1 for all y, so 2x = Log[a^2 + b^2] x = (1/2)*Log[a^2 + b^2] = Log[(a^2 + b^2)^(1/2)] = Log[Sqrt[a^2 + b^2]] . We will write Sqrt[a^2 + b^2] more compactly as |w|, which is the *magnitude* of the complex number w, the distance from w = a + i*b to the origin in the complex plane.

This tells us that the **real part of the logarithm** is just the (real) **logarithm of the magnitude** of the given complex number. How about the imaginary part?

Now, to solve for y, we divide equation [2] by equation [1], giving b/a = Sin[y]/Cos[y] = Tan[y] so y = ArcTan[b/a]

So the **imaginary part of the logarithm** is the **angle** of the complex number, also called its “argument”.

But we have tobe careful of our signs, because if b < 0 and a < 0, then -Pi < y < -Pi/2, which is not evident from the ArcTan expression but is obvious from looking at it geometrically. ArcTan[b/a] is the angle formed by the line connecting w = a + i*b to the origin; this is called the*argument* of w, orArg[w]. But say w = 1 + i. Then Arg[w] = Pi/4, but it can also be Pi/4+2*Pi = 9*Pi/4, or Pi/4-2*Pi = -7*Pi/4. In fact,we can add any integer multiple of 2*Pito Arg[w] and still get the same angle. So, compactly written, y = Arg[w] + 2*Pi*k, for any integer k.

Here are two possible arguments for \(1+i\):

The principal argument is in red, a second option in blue.

It’s also worth pointing out that in quadrants 2 and 3, we need to adjust the angle, because those quadrants are outside the range of the arctan function. For example, \(-1+i\) is in quadrant 2:

Here the arctan would produce the angle in blue, which is negative; we have to add \(\pi\) to get the angle we want, in red.

Putting this all together, we see that x = Log[|w|], y = Arg[w] + 2*Pi*k, z = x + i*y = Log[|w|] + i*(Arg[w] + 2*Pi*k), for any integer k. But z = Log[w], soLog[w] = Log[|w|] + i*(Arg[w] + 2*Pi*k), for any integer k. This is our formula! But notice that it is a bit strange: For one complex value w, there areinfinitely many logarithms, because we can choose any integer k! So it is clearly not like the real logarithm. This arises becausethe complex exponential is many-to-one, that is, more than one value (in fact, infinitely many) of z will give the same value of e^z. Thusthe inverse, Log[w], should be one-to-many, where one value of w will give infinitely many logarithms.

Once again, here is the formula: $$\log(w)=\log(|w|)+i(\arg(w)+2\pi k),\;\;k\in\mathbb{Z}$$

To illustrate the existence of infinitely many values of the log, consider that both \(e^{1+i}\) and \(e^{1+(1+2\pi)i}\) are equal to \(e(\cos(1)+i\sin(1))\approx1.46869+2.28736i\), because \(\sin(2\pi)=0\). Therefore, both \(1+i\) and \(1+(1+2\pi)i\) are natural logs of that same number.

Now that we've found a formula, the usual algebraic laws apply to it, so no additional laws need to be taken into account. The only tricky part is the one-to-many aspect of the complex logarithm - simplifications can be made byforcing Arg[w] to be in the interval [-Pi, Pi] and always taking k = 0. This is called taking the *principal value*, though depending on how you use the complex logarithm, different intervals (like [0, 2*Pi]) and values of k may have to be used. Notice this is much like taking the principal value of the square root; we usually say Sqrt[4] = 2, not Sqrt[4] = -2, though both are equally valid.

We’ll see below that just as taking the principle value of the square root, which works nicely for real numbers, causes trouble for complex (or even negative) numbers, the same is true for logarithms. See Squares, Roots, and Negative Numbers.

This answer got a reply (though it appears that a *different person*, Dave, answered, at an *earlier* time!):

So are you saying that there ismore than one solutionto ln(x^a) where a is complex or that there ISno solutionfor when a is complex?

Presumably he just means “log(a)”, or “\(e^x=a\)”; as written, there is no equation to solve.

Doctor Ceeks answered first:

The natural logarithm isnot well-definedon the complex plane for the same reason that the exponentiation function isnot one to one. Thus, a = b does not imply ln(a) = ln(b). Your argument is similar to the following: 4 = 4 Take square roots, and you get -2 = 2 !

In other words, it is not there are is no square root of 4; there are just too many! And we deal with that by defining a principal root.

Then Doctor Pete responded:

The former is true; that is, the logarithm of a complex number is generallynot a unique value, much in the same way that the square root of a positive real number has two possible values, the positive (principal) value and its negative. This analogy was made by Dr. Ceeks.

That is, just as there are two solutions to \(x^2=4\), there are multiple solutions to, say, \(e^x=1+i\). Another analogy would be the inverse sine function; because the sine is not a one-to-one function, we have to choose one of many possible values for the inverse of that function, by restricting the domain. (See Ranges of Inverse Trig Functions.)

Let's illustrate this property further. We will call a function f an *injection* if a is not equal to b implies f(a) is not equal to f(b). Such a function is also called "one-to-one." From this definition then, we see thaty = x^2 is not an injection, because if we let a = -1, and b = 1, we have f(a) = f(b) = 1. Thus it is not one-to-one. y = x, of course, is an injection. What about y = x^3? Is it an injection? Well, it depends on what domain we are considering. If we think ofy = x^3 as a mapping from the reals to the reals, that is, if we only allow x and y to be real, then yes, itisan injection. But if we take itas a mapping over the complex numbers, then no, itis not, because 1^3 = (-1/2 + i*Sqrt[3]/2)^3 = 1. (There's a third complex number whose cube is 1; what is it?)

The inverse of a one-to-one function is a (single-valued) function; otherwise, it is not. Here is the graph of \(f(x)=x^2\) restricted to **non-negative** values of *x* (red), which is **one-to-one** and has the inverse \(f^{-1}(x)=\sqrt{x}\) (green):

Here, for comparison, is the graph of \(f(x)=x^2\) defined for **all real numbers** (red), which is **not** one-to-one, whose “inverse” is \(f^{-1}(x)=\pm\sqrt{x}\) (green):

The result is a one-to-many “function”.

It’s harder to graph when *x* and *y* can both be complex numbers, so I won’t attempt to illustrate that in the same way.

So for the question of injectivity to be meaningful, one must specify the domain. This is also the case for the exponential function, e^x or Exp[x].Over the reals, e^x is injective. Butin the complex plane, e^x is *not* an injection. This is because e^(x + i*y) = e^x (Cos[y] + i*Sin[y]) . Note that cosine and sine are not injective -- in fact, their periodicity (Sin[x] = Sin[x+2*Pi] = Sin[x+4*Pi] = ...) is immediate proof of this fact. It follows, then, that complex exponentiation is not injective. In fact, it is amany-to-onemapping, so infinitely many values of z will give the same value of e^z.

In Complex Powers of Complex Numbers, we illustrated how the exponential function wraps the plane around the origin; one result is that multiple points are mapped to the same point, illustrating the many-to-one nature of the exponential function.

Thus, when we consider its inverse, thenatural logarithm over the complex numbers, it is not surprising that it should be aone-to-manymapping, where one value of z will result in infinitely many values of Log[z]. In particular, Log[z] = Log[Abs[z]] + i*(Arg[z]+2*Pi*k), k = ...-2,-1,0,1,2,... which is a good exercise to show. (Abs[z] is the magnitude of z, or Sqrt[x^2+y^2] if z = x + i*y. Arg[z] is the argument of z, or the angle it creates as a vector in the complex plane.)

This formula, of course, is what Doctor Pete derived in the original answer.

I know this is very long-winded, but I hope the background information I've provided here explains why the complex logarithm does funny things (which really aren't so strange in the end). Again, as Dr. Ceeks explained, your argument is analogous to the one he provided.

We’ll close with this 2004 question that reveals an odd implication of all this:

Logs of Complex Numbers Give an example showing thatLog(z1/z2) does not always equal Log(z1)- Log(z2)where z1 and z2 are complex numbers. Each time I try to plug in different complex numbers, such as z1 = e^i*pi/3 and z2 = e^2i*pi/3, I find that the two calculations are equal. I can't find a counterexample.

Doctor Vogler answered:

Hi Brittany, The complex logarithm is an interesting function. Simpler complex functions are defined everywhere except at certain points called "poles" where they behave like divisions by zero. But the logarithm has a thing called a "branch" where you have to make a jump. Because the function f(z) = e^z has the property f(z + 2*pi*i) = f(z), there isn't only one inverse (or logarithm). Well, we pick one of those inverses for the function Log, but it causes there to be a jump.

The property mentioned is exactly the one-to-many property (periodicity) that we just looked at. A full exploration of these ideas would requite a course in Complex Analysis.

Think of it this way. If you make a 3-D graph with the complex plane on the x-y coordinates and the value of the imaginary part of the logarithm on the z coordinate, then what we really *should* get is aspiralthat is flat when you go straight out from the center x = y = 0 but winds upwards as you go counterclockwise around the center, and increases by 2*pi each time around.

It would look like this (a helicoid):

But that's not a function because a function can only have one value, so we limit the domain byjust taking one loop. That means the spiral starts on the negative x axis at z = -pi, does one counterclockwise loop and ends on the same negative x axis but higher, at z = pi. So there is thisbreakwhere the complex logjumps by 2*pialong the negative x axis.

Here we are graphing the function \(z=\text{Im}\left(\log(x+iy)\right)=\arg(x+iy)=\arctan\left(\frac{y}{x}\right)\), showing where \(z=0,1,2\):

Observe the break along the negative *x*-axis.

Everywhere else, the log behaves normally; but across the break, it does not:

So in order to make Log (z1/z2) different from Log(z1) - Log(z2), you just have to go over this break. In other words, find a Log(z1) whose imaginary part is negative near -pi and a Log(z2) whose imaginary part is positive near pi (or vice-verse), and then subtracting them will be on some other part of the logarithm spiral that was cut off in order to make Log a function.

In effect, we want to find a pair of numbers that will force us to go “over the cliff” when we divide them!

Brittany replied:

I tried z1 = e^(i5pi/3) and z2 = e^(i*pi/3) so that Log(z1) = -i*pi/3 and Log(z2) = i*pi/3, and when you subtract these, you get -i2pi/3, which is what I get when I calculate Log(z1/z2). Am I not looking at the correct z1 and z2?

She chose $$z_1=e^{(5\pi/3)i}=\cos\left(\frac{5\pi}{3}\right)+i\sin\left(\frac{5\pi}{3}\right)\approx0.5-0.866i\\z_2=e^{(\pi/3)i}=\cos\left(\frac{\pi}{3}\right)+i\sin\left(\frac{\pi}{3}\right)\approx0.5+0.866i$$

Using these numbers, $$\frac{z_1}{z_2}=\frac{0.5-0.866i}{0.5+0.866i}=\frac{(0.5-0.866i)(0.5-0.866i)}{(0.5+0.866i)(0.5-0.866i)}=\frac{-0.5-0.866i}{1}=-0.5-0.866i$$

By our formula, the principal log is $$\ln(-0.5-0.866i)=\ln(|-0.5-0.866i|)+i(\arg(-0.5-0.866i))\\=\ln\left(\sqrt{0.5^2+0.866^2}\right)+i\left(\arctan\left(\frac{-0.866}{-0.5}\right)-\pi\right)\\=\ln(1)+i(\arctan(1.732)-\pi)=-2.094i$$ (The subtraction of pi is because it is in the third quadrant, so the angle is not \(\frac{\pi}{3}\) but \(\frac{\pi}{3}-\pi=-\frac{2\pi}{3}\), as Brittany pointed out.)

Or, using properties of exponents, $$\ln\left(\frac{z_1}{z_2}\right)=\ln\left(\frac{e^{(5\pi/3)i}}{e^{(\pi/3)i}}\right)\\=\ln\left(e^{(5\pi/3)i-(\pi/3)i}\right)=\ln\left(e^{(4\pi/3)i}\right)\\=\ln\left(e^{(-2\pi/3)i}\right)=\frac{-2\pi}{3}i=-2.094i$$ (Here, I replaced the exponent \((4\pi/3)i\) with the coterminal angle \((-2\pi/3)i\) to obtain the principal argument.)

Since $$\ln(z_1)=\ln(0.5-0.866i)=\ln(|0.5-0.866i|)+i(\arg(0.5-0.866i))\\=\ln\left(\sqrt{0.5^2+0.866^2}\right)+i\left(\arctan\left(\frac{-0.866}{0.5}\right)\right)\\=\ln(1)+i(\arctan(-1.732))=-1.047i$$ and $$\ln(z_2)=\ln(0.5+0.866i)=\ln(|0.5+0.866i|)+i(\arg(0.5+0.866i))\\=\ln\left(\sqrt{0.5^2+0.866^2}\right)+i\left(\arctan\left(\frac{0.866}{0.5}\right)\right)\\=\ln(1)+i(\arctan(1.732))=1.047i$$ we find that $$\ln(z_1)-\ln(z_2)=-1.047i-1.047i=-2.094i$$

This is \(-\frac{2\pi}{3}\); no adjustments were needed because \(z_1\) and \(z_2\) are in quadrants 1 and 4 respectively, in range of the arctan.

So she’s right; \(\ln(z_1/z_2)=-2.094i=\ln(z_1)-\ln(z_2)\) for this example.

Doctor Vogler answered:

Hi Brittany, You need imaginary parts closer to pi and -pi. Try making Log(z1) = -i*2*pi/3 Log(z2) = i*2*pi/3

Let’s repeat the process for these numbers:

We take $$z_1=e^{(-2\pi/3)i}=\cos\left(-\frac{2\pi}{3}\right)+i\sin\left(-\frac{2\pi}{3}\right)\approx-0.5-0.866i\\z_2=e^{(2\pi/3)i}=\cos\left(\frac{2\pi}{3}\right)+i\sin\left(\frac{2\pi}{3}\right)\approx-0.5+0.866i$$

Using these numbers, $$\frac{z_1}{z_2}=\frac{-0.5-0.866i}{-0.5+0.866i}=\frac{(-0.5-0.866i)(-0.5-0.866i)}{(-0.5+0.866i)(-0.5-0.866i)}=\frac{-0.5+0.866i}{1}=-0.5+0.866i$$

By our formula, the principal log is $$\ln(-0.5+0.866i)=\ln(|-0.5+0.866i|)+i(\arg(-0.5+0.866i))\\=\ln\left(\sqrt{0.5^2+0.866^2}\right)+i\left(\arctan\left(\frac{0.866}{-0.5}\right)+\pi\right)\\=\ln(1)+i(\arctan(-1.732)+\pi)=2.094i$$ (The addition of pi is because it is in the second quadrant, so the angle is not \(-\frac{\pi}{3}\) but \(\frac{2\pi}{3}\).)

Or, using properties of exponents, $$\ln\left(\frac{z_1}{z_2}\right)=\ln\left(\frac{e^{(-2\pi/3)i}}{e^{(2\pi/3)i}}\right)\\=\ln\left(e^{(-2\pi/3)i-(2\pi/3)i}\right)=\ln\left(e^{(-4\pi/3)i}\right)\\=\ln\left(e^{(2\pi/3)i}\right)=\frac{2\pi}{3}i=2.094i$$

Since $$\ln(z_1)=\ln(-0.5-0.866i)=\ln(|-0.5-0.866i|)+i(\arg(-0.5-0.866i))\\=\ln\left(\sqrt{0.5^2+0.866^2}\right)+i\left(\arctan\left(\frac{-0.866}{-0.5}\right)-\pi\right)\\=\ln(1)+i(\arctan(1.732)-\pi)=-2.094i$$ and $$\ln(z_2)=\ln(-0.5+0.866i)=\ln(|-0.5+0.866i|)+i(\arg(-0.5+0.866i))\\=\ln\left(\sqrt{0.5^2+0.866^2}\right)+i\left(\arctan\left(\frac{0.866}{-0.5}\right)+\pi\right)\\=\ln(1)+i(\arctan(-1.732)+\pi)=2.094i$$ we find that $$\ln(z_1)-\ln(z_2)=-2.094i-2.094i=-4.189i$$

This is \(-\frac{4\pi}{3}\); adjustments were needed because \(z_1\) and \(z_2\) are in quadrants 3 and 2 respectively.

So this time; \(\ln(z_1/z_2)=\frac{2\pi}{3}i\ne-\frac{4\pi}{3}i=\ln(z_1)-\ln(z_2)\). In fact, they differ by \(2\pi\), “height of the cliff”.

]]>

We’ll start with a broader question, from 2001:

Definition of Logarithm In the definition of logarithm, there are some conditions for the base. a^x = b iff x = log_a(b) where a > 0, a <> 1 (a is not equal to 1), and b > 0.Why stipulate these conditions?If they are not assumed, are there any problems or contradictions that arise? I want to know why mathematicians defined logarithm this way.

We omitted such details in introducing logs, but these restrictions will be stated in any good textbook. Here is an example of such a definition (using different variables):

I answered, considering each of the three restrictions in turn:

Thanks for writing to Dr. Math. Let's just think about what happensif we violate any of these conditions. First, ifb <= 0, then we are trying to solve: a^x <= 0 You will find that there is no solution; x must be "negatively infinite" even to produce 0, and there is no real number x for which a^x < 0 if a > 0.

The **range** of an exponential function consists of only positive numbers, as we can see from the graph of \(y=10^x\):

This restricts the **domain** of the logarithm, which we can see in the graph of \(y=\log_{10}(x)\):

Next, ifa = 1, we are trying to solve: 1^x = b Since 1^x = 1 for all x, this has no solution unless b = 1. So it makes no sense to talk about a logarithm with base 1.

Here are the graphs of \(y=1^x\) and \(y=\log_1(x)\) (actually \(x=1^y\), since Desmos doesn’t handle logs with base 1):

We can only take the log of 1, and then it doesn’t have a single value: \(\log_1(1)\) is indeterminate. Not very useful!

Now, ifa = 0, we have: 0^x = b and again, there is no solution unless b = 0. This is likewise useless.

Here are the graphs of \(y=0^x\) and \(y=\log_0(x)\) (that is, \(x=0^y\)):

Again, the log exists only for \(x=0\), and then is equal to any (positive) number. We can’t define this as a function either.

Finally, ifa < 0, things get tricky. We can defineintegral powersof a (which will be positive for even x and negative for odd x); but what aboutfractional powers? a^(1/2) is undefined (or rather, imaginary) when a is negative. If the powers are not defined for a negative base, then logarithms are not defined either!

We’ll dig into this below. It gets even more complicated than I’ve described.

As you can see, the restrictions you asked about merely keep us from talking about logarithms when they don't make any sense. BUT ... In the first and last cases, you'll notice I mentioned real and imaginary numbers. The fact is, your definition is valid only when we are considering only real numbers.We can extend the logarithm to apply to complex numbers, and then some of these restrictions can be relaxed. You can read about complex logs and some of the complications they introduce in the Dr. Math archives: The Log of a Negative Number http://mathforum.org/dr.math/problems/witty3.27.98.html Log of Complex Number http://mathforum.org/dr.math/problems/langlands.9.15.96.html

The rest of this post will look into the question of negative bases, focusing on the exponential function; next time we’ll look at these further issues when we allow complex numbers.

This question, from later in 2001, turns our attention to exponential functions:

Base of an Exponential Function Why can't thebase of an exponential functionbe negative?

Doctor Rob answered, starting with positive bases for comparison:

Thanks for writing to Ask Dr. Math, Stefanie. Excellent question!! Let's start by looking at things like64^(1/2). This is asquare rootof 64. There aretwo of these square roots, -8 and +8. We want the expression to represent just one of these two, and we pick the positive one, which we call the "principal value" of the square root. Thus 64^(1/2) = 8.

Recall that \(a^{1/2}=\sqrt{a}\) because $$\left(a^{1/2}\right)^2=a^{(1/2)\cdot2}=a^1=a=\left(\sqrt{a}\right)^2$$ See Squares, Roots, and Negative Numbers for details.

And just as we define \(\sqrt{x}\) as the non-negative square root (just the one principal value, rather than both at once) in order to make it a **function**, we do the same for the fractional power.

When we look at64^(1/3), this is acube rootof 64. There are three of these cube roots: 4, -2+2*sqrt(3)*i, and -2-2*sqrt(3)*i. Notice that two of these are complex numbers. For a principal value, naturally we pick the positive one.When we stick to positive bases, we always have a positive principal value we can use, even when the exponent is an irrational real number. Furthermore, the function you get from the reals to the positive reals turns out to have no jumps or other problem points, and to have a smooth graph (the technical terms are "continuous" and "differentiable").

Here is the graph of the cube root of *x*, \(x^{1/3}\) you can draw it by moving your pencil smoothly from one point to another:

When thebase is negativeand theexponent is rationalwith anodd denominator, like (-64)^(1/3), there is a negative real number -4 which can be chosen to be the principal value. When the base is negative and the exponent is rational with aneven denominator, there isno real root. For (-64)^(1/2), you have the two complex roots 8*i and -8*i. It is not clear which of these, if either, you can or should choose for theprincipal value. When the base is negative and theexponent is irrational, you will also not have any real root, and no clear choice for the principal value. Furthermore, there are problems making these choices in such a way that the function resulting, mapping the reals into the complex numbers, iscontinuous and differentiable.

Here are the three cube roots of -64:

Here are the two square roots of -64:

The ideas used here are covered in Euler’s Formula: Complex Numbers as Exponents, and in Powers of Roots and Roots of Powers.

We’ll look into irrational exponents below.

As a result of these considerations, it is very clear that it is a good idea torestrict one's attention to exponential functions with positive bases, and avoid the difficulties encountered with negative ones.

So we *can* calculate **particular powers** of negative bases, but as a **function of a real variable**, they don’t have the properties we want for functions (particularly in calculus: changing smoothly as we vary the exponent), and so are not useful for those purposes.

Here’s a long question from 2011, taking it further:

Raising a Negative to an Irrational Power? It Depends Dear Dr. Math, Hi, my name is Nick. I am a 10th grader attending Lake Central High School. I do things most other normal kids don't. I seem to rather enjoy math a little too much. I can understand many restrictions and rules in functions and stuff, but I came across a strange question that none of my math teachers could answer -- and it still gets me. The question is as follows: "When taking a function to anirrational power, thenegativeset is excluded even though the possibility of finding apositivenumber to an irrational power IS possible. Why is finding a negative to an irrational power not?" I have asked this to all of my teachers and my father (they are the best for my math answers) and even other relatives (though they're not really any help) -- but no one has had a clear answer. So, Dr. Math, why can't you find a value for a negative number raised to an irrational exponent? Or maybe you can; does it deal with logarithmic functions? Please try it. For example, type this on your calculator and see what you get:(-3)^(sqrt3)I seem to continue to get "Error"; and when writing it out, I am stuck. This is not only for the square root of three as an exponent, but other irrationals, such(-3)^e, and imaginaries, such as(-3)^i. I know i is the square root of (-1), but these all seem to not have any real value, so I am just truly baffled. Please, don't discard this request.

My TI-30X IIS reports “Domain error” when I enter (-3)^√(3), but correctly finds \((-3)^\frac{1}{3}=-1.44224957\) and \((3)^\sqrt{3}=6.704991854\). so “negative^rational” and “positive^irrational” work (at least sometimes), but “negative^irrational” does not. My Casio fx-115ES PLUS is similar.

Doctor Vogler answered:

Hi Nick, Thanks for writing to Dr. Math. I enjoy math more than other adults do, too, and I am proud of that fact. There are a number of different ways to approach your question. There is some discussion of this issue at Base of an Exponential Function http://mathforum.org/library/drmath/view/55604.html Graph of y = (-n)^x http://mathforum.org/library/drmath/view/66708.html Why Is (-n)^fractional Invalid? http://mathforum.org/library/drmath/view/62979.html y to the x Power http://mathforum.org/library/drmath/view/63367.html

The links are to our second question above, the question we’ll close with below, one that I considered for this post but couldn’t fit, and another that covers similar ideas.

Basically, it all comes down to:What do you mean when you write an expression like x^y?The first definition of exponents you have likely seen was thealgebraic definition, where you define x^n for positive integers n as x times itself n times, then x^(1/n) as a number y such that y^n = x, and x^(p/q) as (x^p)^(1/q). You might have even seen x^y for irrational numbers y defined as alimit(if you've encountered limits; they are usually introduced in calculus or pre-calculus). In this setting, negative numbers to irrational exponents are not defined becausethe limit I spoke of generally does not exist; when it does, it depends on the choice of sequence approaching y.

As he says, we first define **positive integer** powers, $$3^4=3\cdot3\cdot3\cdot3=81,$$ then **negative integer** and **zero** powers, $$3^0=1,\\3^{-4}=\frac{1}{3^4}=\frac{1}{81}\approx0.012345679\dots,$$ then **fractional** powers, $$3^{\frac{3}{4}}=\sqrt[4]{3^3},$$ and then **real** (irrational) powers, so that $$3^\pi=3^{3.14159\dots}\approx31.544,$$ which we approach as the exponent approaches \(\pi\): $$3^3=27,3^{3.1}\approx30.135,3^{3.14}\approx30.489,\dots$$

This idea is discussed from a variety of perspectives in What Do Exponents Mean?

But, as we’ve seen, this doesn’t consistently work for negative bases.

A different way to define exponents usescalculusand first defines the natural logarithmln(x) as an integral, or exp(x) (also written e^x) with adifferential equation, and the other as the inverse function. In this context, x^y = exp(y*ln(x)) But then you have the problem that the integral for ln(x) is only defined when x is positive, sothis definition only works when the base x is positive.

Different textbooks can take different approaches to these equivalent definitions:

- One is to define $$\ln(x)=\int_1^x\frac{1}{t}dt,$$ and then define \(e^x\) as the inverse function, so that $$e^x=y\text{ when }x=\ln(y).$$
- The other is to define \(e^x\) as the function for which $$\frac{d}{dx}f(x)=f(x)\text{ and }f(0)=1,$$ and then define \(\ln(x)\) as the inverse function, so that $$\ln(x)=y\text{ when }x=e^y.$$

But by either of these definitions, we can’t define \(x^y\) for negative *x* because we can’t define \(\ln(x)\).

A third way usescomplex analysis and imaginary numbers, where the function exp(x) can be defined for complex numbers. But then exp(x) has many inverses, so thenatural logarithm ln(x) is a multi-valued function. Lucky for us, exp(y*ln(x)) is always the same, no matter which value you pick for ln(x), as long as y is an integer. Butif y is rational, then different choices of ln(x) will givedifferent values for x^y, and the number of different values you can get is equal to the denominator of y.

We’ll be looking at multivalued logarithms (and powers) in more detail another time, but we’ve touched on it just enough here to see some of the complexity.

For example, the four values for 1^(1/4) are 1, -1, i, and -i.When y is irrational, then there areinfinitely many different values for x^y. If x is a positive real number, and y is a real irrational number, thenexactly oneof those infinitely many different values for x^y is a real number. But if x is a negative real number, and y is a real irrational number, thennoneof those infinitely many different values for x^y is a real number; all of them are complex numbers.

My impression has generally been that calculators would evaluate powers of any base using the natural log: \(a^b=\left(e^{\ln(a)}\right)^b=e^{b\ln(a)}\), in order to use their logarithm algorithm. But that can only be done for positive bases, so they would just call a negative base an error.

On the other hand, as we’ve seen, a calculator should be able to calculate *some* **rational** powers of a negative base, such as \((-64)^{1/3}\); both my TI 30X IIS and my Casio fx-115ES Plus can handle that; clearly they don’t rely on logarithms.

On the other hand, a calculator that can handle complex numbers probably uses polar form, and does whatever it can to obtain a single value. This is discussed in

x^x, Discrepantly

This is from 2005:

Graph of y = (-n)^x I am curious as to what the graph of y = (-n)^(x) would look like, such as y = (-2)^x.My graphing calculator will not show the graphas anything, but has many real values in a table of values. It is confusing because a negative number to the power of let's say 0.6 (3/5, so odd root) is areal numberwhile a negative number to the power of 0.7 (7/10, so even root) would be animaginary number. This would mean there are many random points that are both real and imaginary on a graph of this sort. Is it possible to construct a graph of y = (-n)^x because of this interesting coincidence that happens with odd and even roots of the negative numbers?

Each graphing tool will do something a little different.

Doctor Vogler answered:

Hi Alexander, Thanks for writing to Dr. Math. It depends partly on what you mean by "(-n)^x". You see, generally exponents are either definedalgebraically for rational exponentsorcontinuously (through calculus) for positive bases. If you have a negative base and want all exponents (including irrationals) then things get ugly. See also Base of an Exponential Function http://mathforum.org/library/drmath/view/55604.html

This is our second question above.

Notice that in the case of all irrational and many rational exponents, the only possible value for y is complex. When graphing, you generally don't graph complex solutions. (Did you plot (i+1, 2i) on your graph of y = x^2? I didn't think so.) Soyour graph will essentially be dotted lines where the rational values take you. If you plotted those values where x is rational, then you would have a dotted line along y = n^x (since y = n^x when x has an even numerator and odd denominator) and a dotted line along y = -(n^x) (since -y = n^x when x has an odd numerator and odd denominator).

A graphing tool doesn’t know the general picture; it typically chooses regularly spaced values for *x* and plots points using whatever values of *y* it finds (if it tries at all). Here is what Desmos shows for \(y=(-2)^x\):

It’s made a valiant effort, showing whatever points it found that work; but those are not *all* the values of *x* that represent fractions with an even denominator. For instance, it shows positive values \((-2)^0=1\), and \((-2)^{1.746}=3.354\), and negative values \((-2)^{-1.032}=-0.489\), and \((-2)^{1.032}=-2.044\). All of these lie on the graph of \(y=\pm2^x\):

(Zooming in or out changes what points are shown.)

If you want to consider what are thecomplex valuesof y = (-n)^x, then you have a completely different beast, since there are more than one possible values for y unless x is an integer, and there are infinitely many such values unless x is rational. See also Complex Powers http://mathforum.org/library/drmath/view/60383.html

This question is about oddities that arise in complex powers of real numbers (such as \(1^i\)). Similar ideas are found in Complex Powers of Complex Numbers.

]]>

Our first question, from 1997, asks about natural logarithms in particular, but gets a full, though informal, explanation of how logs work:

Natural Logs Hello, I have tried for the past two weeks to find what I could onlogs, and mostly natural logs, but have found nothing that I could understand. I need to understand natural logs. Could you please explain them in the simplest terms possible? I don't understandwhat they are used for. Are they used for other subjects in Math?

Doctor Steven answered:

Logarithms can be pretty tricky. The thing to remember is that a logarithm is just another way to write an equation that looks like this: x = a^y. You see, normally we like to write functions like y = something, but here we can't do that since the y is in the exponent. So what we do is create some notation that will let us write this like y = something. This notation is the logarithm. We say log_a(b) is the log base a of b. And so we havey = log_a(x), which means the same thing asx = a^y.

In the usual notation, we say $$y=\log_a(x)\;\;\Leftrightarrow\;\;x=a^y$$

In other words, the (base *a*) logarithm is the **inverse** of the (base *a*) exponential function; it exists in order to solve exponential equations for a variable in the exponent, by “undoing” the exponentation.

This implies the **inverse property**, which can be expressed in two ways: $$\log_a(a^x)=x$$ $$a^{\log_a(x)}=x$$

It also leads to several **special cases**: $$\log_a(1)=\log(a^0)=0$$ $$\log_a(a)=\log(a^1)=1$$ $$\log_a\left(\frac{1}{a}\right)=\log(a^{-1})=-1$$

Let's do some examples: 1. x = 10^4 is the same thing as log_10(x) = 4. 2. x = 10^y is the same thing as log_10(x) = y 3. x = e^y is the same as log_e(x) = ln(x) = y. In the third example we see that thenatural logis justlog base e.

We saw last time that \(e\approx2.71828\) is a special number that makes a log with the special property that its slope at its *y*-intercept is 1.

What follows are largely proofs-by-example; we’ll see more precise derivations below.

Now let's get into some properties of logarithms. Say we have log_a(x) = 3. This is the same as a^3 = x. Say also we have log_a(y) = 4 - this means a^4 = y. What is log_a(x*y)? Well multiply x and y and we get x*y = a^3*a^4. When multiplying powers we add the exponent so x*y = a^(3 + 4). So log_a(x*y) = 3 + 4 = log_a(x) + log_a(y). So we have our first property of logs:log_a(x*y) = log_a(x) + log_a(y). This property works for any numbers so log_4(20) = log_4(4*5) = log_4(4) + log_4(5) = 1 + log_4(5).

So the product property of exponents $$a^xa^y=a^{x+y}$$ becomes the product property of logarithms: $$\log_a(xy)=\log_a(x)+\log_a(y)$$ When we **multiply** numbers, we **add** the corresponding exponents, which are the logarithms.

The second property is closely related to the first. It states that:log_a(x/y) = log_a(x) - log_a(y). The third property is also related to first property. It states that:log_a(x^n) = n*log_a(x). We can see this property by seeing that x^n = x*x*x*x.... (n times). So we get log_a(x^n) = log_a(x) + log_a(x) + . . . (n times) = n* log_a(x).

The quotient property of exponents $$\frac{a^x}{a^y}=a^{x-y}$$ becomes the quotient property of logarithms: $$\log_a\left(\frac{x}{y}\right)=\log_a(x)-\log_a(y)$$

When we **divide **numbers, we **subtract **the corresponding exponents, which are the logarithms.

As an example, since $$\frac{100,000}{100}=\frac{10^5}{10^2}=10^{5-2}=10^3=1000$$ we know that $$\log_{10}\left(\frac{100,000}{100}\right)=\log_{10}(100,000)-\log_{10}(100)=5-3=2$$

The power property of exponents $$\left(a^x\right)^n=a^{nx}$$ becomes the power property of logarithms: $$\log_a(x^n)=n\log_a(x)$$

When we **raise** a number to a power, we **multiply** the exponent, which is the logarithm.

As an example, since $$100^3=\left(10^2\right)^3=10^{2\cdot3}=10^6=1,000,000$$ we know that $$\log_{10}\left(100^3\right)=3\log_{10}(100)=3(2)=6$$

So we have 3 properties for logs: 1. log_a(x*y) = log_a(x) + log_a(y) 2. log_a(x/y) = log_a(x) - log_a(y) 3. log_a(x^n) = n*log_a(x). We use these properties to change an equation into the form we wish, to make it easier to work with.

I like to point out that taking logs turns multiplication, division, and exponentiation into addition, subtraction, and multiplication, respectively. We can diagram this:

PEMDAS, anyone? Applying logs moves us down one rung on the hierarchy of operations.

If you’re wondering about the missing companion of exponents on the top line, we can think of **roots** as the opposite of exponents, as division is the opposite of multiplication, giving a rule $$\log_a(\sqrt[n]{x})=\log_a(x^{1/n})=\frac{\log_a(x)}{n}$$ showing that when we **take a root** of a number, we **divide** the exponent, which is the logarithm. So we can fill out the diagram:

But we don’t need to learn this as a separate property, since we usually just rewrite a root as a fractional power.

This also reminds us that there is no property of logs that applies to the log of a sum or difference; there is no lower operation.

Another thing to worry about with logs is thechange of base formula. The reason we have a change of base formula is because your calculator probably only has buttons forlog base 10or thenatural logof a number. Unfortunately not many problems will have logs that are in base 10 or base e, so in order to find out the exact value for these problems we need to change the base of the logarithm to either 10 or e so we can plug them into our calculator. The change of base formula goes like this: log_10(b) log_a(b) = ----------- log_10(a) or ln(b) log_a(b) = ------- ln(a). An easy way to remember which number goes on top is to note theb is above the aon the left side of the equation and on the right side it is still above the a.

Today, many calculators have a \(\require{AMSsymbols}\text{LOG}_\square\) button that lets you choose the base. But there are also other reasons to change bases. He forgot to show the reason for this formula; we’ll get to that soon. In general, the formula says $$\log_a(b)=\frac{\log_\square(b)}{\log_\square(a)}$$ where the box could be any base you want.

We’ll derive this formula below.

Let's do some examples of the change of base formula: log_10(100) 1. log_7(100) = ----------- log_10(7) log_10(34) 2. log_5.6(34) = ----------- log_10(5.6) ln(1.7) 3. log_2.3(1.7) = ------- ln(2.3) You can plug these into your calculator to find the actual decimal value for these logarithms.

Sometimes, you don’t need a calculator to find a log to a different base; for example, since \(25^{1/2}=\sqrt{25}=5\). we know that \(\log_{25}(5)=\frac{1}{2}\). In such a case, the change of base formula provides a long way (useful as a check): $$\log_{25}(5)=\frac{\log_{10}(5)}{\log_{10}(25)}=\frac{0.69897}{1.39794}=0.5$$

Here’s a similar question, from 2001:

Logarithms' Relation to Exponents I have a bunch ofrules for logs, properties and suchlike, but I find it hard to remember them without a proof. My precalculus book has no proof of why logs work or even what they are, nor does my calculus book. I understand what logs are, and their relation to Euler's constant, but I don't understandwhy they are what they are. Please help me.

Doctor Fenton answered:

Hi Vid, Thanks for writing to Dr. Math. The way I like to think about logarithms is that they arejust another language for describing exponentials. In exponential language, we emphasize the base; logarithms emphasize the exponent. Each property of logarithms is just aproperty of exponentials, expressed from a new point of viewin the logarithmic language. I assume that you know the fundamental exponential properties: if a > 0, then (1) a^m*a^n = a^(m+n) (2) a^m/a^n = a^(m-n) and (3) (a^m)^n = a^(m*n) .

We’ll translate these three rules into log terms:

If a^x = X (note the capital and lower case letters: x is the exponent, and X is a raised to that exponent), then we can rephrase this as log_a (X) = x , where log_a denotes the logarithm to the base a. This logarithmic statement is exactly a restatement of the exponential relation a^x = X .

For any variable in upper case, we’ll use lower case for the corresponding exponent (that is, log). This makes the relationships easy to keep track of.

Logarithmic properties are likewise just restatements of the exponential properties above. To translate, suppose (*) a^x = X and a^y = Y . Restating these relations as logarithms gives (**) log_a (X) = x and log_a (Y) = y.

Now we make those substitutions in the exponent properties.

Then exponent rule (1), which says thata^x * a^y = a^(x+y)can be written as X * Y = a^x * a^y = a^(x+y) , which can be restated as log_a (X*Y) = x + y . Substituting for x and y using (**) gives the first rule for logarithms, (1')log_a (X*Y) = log_a (X) + log_a (Y).

$$a^xa^y=a^{x+y}$$ becomes $$\log_a(XY)=\log_a(X)+\log_a(Y)$$

In exponent form, we can say that **product of powers** is the **power of the sum**; in log form, we say that the **log of a product** is the **sum of the logs**. Or you might say, **multiplying** powers **adds** the exponents, and **multiplying** numbers **adds** the logs.

Exponent rule (2) becomes X/Y = a^x / a^y = a^(x-y) so log_a (X/Y) = x - y , or (2')log_a (X/Y) = log_a (X) - log_a (Y).

$$\frac{a^x}{a^y}=a^{x-y}$$ becomes $$\log_a\left(\frac{X}{Y}\right)=\log_a(X)-\log_a(Y)$$

Dividing numbers subtracts the logs.

Finally, exponent rule (3) becomes X^n = (a^x)^n = a^(x*n) , which can be restated as log_a (X^n) = x*n , or (3')log_a (X^n) = n * log_a (X). Those are the basic properties of logarithms. Any additional properties you need can be derived from these.

$$\left(a^x\right)^n=a^{nx}$$ becomes $$\log_a(X^n)=n\log_a(X)$$

A 2007 question gets to the formula we mentioned, but didn’t prove, above:

Deriving the Change of Base Formula for Logarithms In my Pre-Calculus class we have been learning about the properties of logarithms. Since calculators have only two bases--base 10 and base e, we have learned how to change the base of a logarithm using this formula: log x b log x = ---------- a log a b I know how to use this formula, butI have no idea why it works. Can someone give me a proof explaining why this works? Thanks a lot.

I answered:

Hi, Joe. It's not too hard to prove, though the notation can get messy! I often accidentally derive the formula in the course of solving logarithmic equations. Let's start by calling the log we're looking for y: log_a(x) = y Now we write that inexponential form: x = a^y (That is, I raised the base a to the exponent on each side of the original equation.)

This relationship expresses the fact that the logarithm is the **inverse** of the exponential function.

Now we want tosolve this equation for y, using onlybase b logs, not base a logs. To do this, we take the log of each side: log_b(x) = log_b(a^y) Now we simplify the right side: log_b(x) = y log_b(a)

Here we have used the power rule.

To get y by itself, we just have to divide both sides by log_b(a): log_b(x) / log_b(a) = y Substituting log_a(x) back in for y we have: log_a(x) = log_b(x) / log_b(a) And we're done!

$$\log_a(x)=\frac{\log_b(x)}{\log_b(a)}$$

As you can see, the formula is just the natural result of solving using an available log; that's why I so often get a result looking like this, and then slap myself on the head and say, "I coulda used the base-change formula!".

For example, suppose I were solving the equation $$4^x=5$$

I could solve that by taking the base-4 log of both sides, $$\log_4(4^x)=\log_4(5)\\x=\log_4(5)$$

But then I realize I want a decimal value, and my calculator lacks a general log button. So I try again, using base 10: $$\log_{10}(4^x)=\log_{10}(5)\\x\log_{10}(4)=\log_{10}(5)\\x=\frac{\log_{10}(5)}{\log_{10}(4)}\approx\frac{0.69897}{0.60206}\approx1.16096$$

But the work I did is identical to the change of base formula.

There are a couple special cases worth observing. If we take *x* itself as the new base, we get what we might call the “base exchange formula”: $$log_a(x)=\frac{1}{log_x(a)}$$

If we raise the base to a power, we get $$\log_{a^n}(x)=\frac{\log_{a}(x)}{\log_{a}(a^n)}=\frac{\log_{a}(x)}{n}$$

And if we raise both base and argument to the same power, we leave the result unchanged: $$\log_{a^n}(x^n)=\frac{\log_{a}(x^n)}{\log_{a}(a^n)}=\frac{n\log_{a}(x)}{n}=\log_{a}(x)$$

And what if we take the reciprocal of the base? $$\log_{1/a}(x)=\frac{\log_{a}(x)}{\log_{a}(1/a)}=\frac{\log_{a}(x)}{-1}=-\log_{a}(x)$$

We’ve stated the properties briefly, as we often do; but that misses some subtleties brought out by this 2002 question:

Error in One of the Laws of Logarithms? We were discussing a problem in precalculus today and seemed to discover a basic flaw in one of the exponent laws. Recall:log(x^2) = 2log(x)It is also a fact that the log functions have adomain restrictionon all values less than or equal to zero. However: Forlog(x^2), the only value that is restricted (less than or equal to zero) is zero itself (log(0^2) = log(0)). For2log(x), which should supposedly be the same function by the law stated above, there are restrictions on all values of X less than or equal to zero. Basically, what it comes down to is that negative numbers are an acceptable input for the function before you apply the law, and negative numbers are no longer an acceptable input after you apply the law. This means thatthe two functions are not the same, and inherently disproves that law of logarithms. Doesn't it? We're thoroughly perplexed, and resorted to the Internet for assistance.

Two functions are considered the same only if they have the same values for all inputs, which implies that they have the same domain (the same “all inputs”). But when we apply this rule, we change the domain!

Here, for example, if we take \(x=-1\), then $$\log\left(x^2\right)=\log\left((-1)^2\right)=\log(1)=0$$ but $$2\log(x)=2\log(-1)\text{ is not defined}$$ Is the rule wrong?

I answered:

Hi, Charles. You haven't shown that the law iswrong, but only that it has animplicit restriction: log(a^b) = b log(a) for all a and bfor which both logarithms are definedIf a is negative and b even, then the left side is defined but the right side is not. You are taking the property to mean that thefunctionon the left isidenticalto the function on the right, including having the same domain; butthat's not what it means. It is only apointwise identity(true for one pair of values at a time), not a statement about the twofunctions as a whole.

One effect of this is that in solving equations involving logs, we can inadvertently change the domain, so that a value that is valid in the original equation is no longer valid in the new equation, or (more often) vice versa.

The same can be said of the other logarithm identities, such as log(ab) = log(a) + log(b) where, if a and b are both negative, the left side is defined but the right is not.

For example, if we take \(a=-2\) and \(b=-5\), then $$\log(ab)=\log((-2)(-5))=\log(10)=1$$ but $$\log(a)+\log(b)=\log(-2)+\log(-2)\text{ is not defined}$$

We even face the same problem with simpler facts: (sqrt(x))^2 = x is truewhenever sqrt(x) is defined; it is not wrong just because there are values of x for which only the right side is defined. We just have to clearly state the restriction: "for all x >= 0" .

This issue is mentioned in the context of solving equations (where it can cause either missed solutions or extraneous solutions) in Extraneous Solutions: Causes and Cures. Here is a 2005 question that was linked there:

Are Properties of Logarithms Missing Something? I have a question about using the properties of logs to solve equations. For example, I can solve this equation in two ways:ln x^2 = -7e^(ln x^2) = e^(-7) x^2 = e^(-7) x = plus or minus (e^(-7/2)) orln x^2 = -72 ln x = -7 ln x = -7/2 x = e^(-7/2) Using the second method, you only get the positive answer.Where did I lose the negative answer?Why is this not taken into account when using properties like this are discussed?

The first time, Sara raised *e* to the power on each side of the equation, then took the square root(s) to find both solutions, \(\pm e^{-7/2}\). The second time, she applied the power rule first, and didn’t need to take a root, but only got the positive solution. What happened?

I answered:

Hi, Sara. Good question! This is a fact that is sometimes swept under the rug: when you apply some of these properties,the domain of the expression changes. In particular, although log(a^n) = n log(a) for any a and nfor which both sides are defined, the left side is defined when a < 0 and n is even, but the right side is not. So in applying the property, you lose negative values of a. Sometimes books mention this, but then avoid it by specifying that the variable is positive when giving equations to solve. Some books may not mention it at all, which is not a good idea.

In applying the property, she reduced the domain of the problem, implicitly assuming *x* is positive.

Another way it could be dealt with would be to use the following when n is even:log(a^n) = n log|a|Authors probably don't want to subject their students to a mix of absolute values and logs, which would be too much for many of us! But this would retain all solutions in your problem: ln x^2 = -7 2 ln |x| = -7 ln |x| = -7/2 |x| = e^(-7/2) x +- e^(-7/2)

I’m quite sure I’ve never seen the property stated this way, which is similar to the fact that \(\sqrt[n]{x^n}=|x|\) for even index *n*.

We’ll start with this 1998 question:

'Log' Button In math we have been using the 'log' button on our calculators to solve problems that involve compound interest, etc., and our teacher sayswe'll learn what the 'log' button does next year. But we want to know now! The teacher says it has something to do with theopposite of exponents, but I still have no clue.

I sometimes struggle with the order in which to explain things, because when you need to *use* a new concept, sometimes the students aren’t ready for the concept on which it is based, so they can’t fully *understand* it (and perhaps don’t really need to). So we may have to tell a student, just use it for now. But I’m with the student: Some sort of explanation would be nice.

Doctor Sam answered Brad in that spirit:

Well, logarithms can be a big topic and if you are interested you might try reading about them in another math book. But for aquick introduction, here goes: First, the wordlogarithmis really a synonym for the wordexponent or power.

I tell students, “the logarithm is the exponent”; what I mean by that is that when we see that, say, \(10^4=10,000\), we can say that 4 (the **exponent**) is the **logarithm** of 10,000. And that’s what the teacher means by “the opposite of exponents”: When we find a logarithm, we are **finding what exponent is needed** to get that number. We’re working backward.

Second, there are lots of different kinds of logarithms depending upon thebaseyou are using. The LOG key on your calculator is probably "base 10 logarithm". Try this: make a table of values of different powers of 10, like this: power x: -2 -1 0 1 2 3 ... 10^x: 0.01 0.1 1 10 100 1000 ... Ordinarily we read these as "10 to the power 2 equals 100" or "10 to the power 3 equals 1000." This is fine when you need to calculate a power of ten, but what if you know the answer and need tofind the exponent? To solve 10^x = 10000, for example, you have to think "ten to what power is ten thousand?" That's not too hard. But what if the problem is 10^x = 25?

So far we just have a *name* (“logarithm”, or “log” for short) for what we want to do. The table gives us a *method*, but only in the easiest cases.

LOGARITHMS were invented (back in the sixteenth century I think) to answer these kinds of questions. LOG(25) means "the power of 10 that produces 25" and LOG(1000) = 3 because "the power of 10 that produces 1000 is 3." You can make a little table of logarithms just by switching the two rows of my table above: 10^x: 0.01 0.1 1 10 100 1000 ... power x: -2 -1 0 1 2 3 ... only now the second row is called the logarithm. So here it is one more time: n: 0.01 0.1 1 10 100 1000 ... LOG(n): -2 -1 0 1 2 3 ...

The trouble is, when we want to fill in numbers in between, none of the arithmetic you’ve learned can do it; there is no simple formula for the log. (That’s why we need a name, and a calculator button for it!)

What we’re doing here is finding the **inverse of the exponential function**: $$\text{If }y = 10^x\text{, then }x=\log(y)$$

Back in the sixteen hundreds (in fact even back in the dark ages when I went to school in the 1960's) we didn't have pocket calculators with LOG keys. Companies publishedbooks of tables, page after page of the powers of 10 that would give you almost any number you wanted. It's a lot easier to press a key!

Here’s an example of such a table, from the book I used in college (CRC Standard Mathematical Tables):

The table gives the “mantissa”, or decimal part, of logarithms, which is the log of a number from 1 to 10.

For example, to find the logarithm of **1.1223** I find **112** in the left column and **2** on the top, leading me to 04 999; then I see that the next number to the right, 04 038, is an increase of 39, and the little table at the right tells me that **3**/10 of 39 is 11.7. So I add \(04999+11.7=05010.7\), and move the decimal point to the start, \(0.050107\). A calculator shows it to be \(0.0501089\dots\). The table claims to be accurate to five places, so we’d round our answer to \(0.05011\), which agrees.

Yes, calculators are easier …

By the way, when we look up numbers in the table in reverse, finding the number whose log is a given value, they called this the “**anti-log**“, because they thought of the **log** as the main thing, what the table was made for — but in a modern view, it’s the **exponential** that’s primary and the log is its **inverse**. So what was called the anti-log is, in fact, just the exponential function.

I do want to mention that the little table I used gives powers of 10, and so these are calledbase tenlogarithms (orCOMMON LOGARITHMS). But almost any number can be used as the base. If you are interested in solving problems with powers of 2 (175 = 2^x) it would be helpful to have abase twologarithm key on your calculator. You probably don't have one, but computer scientists who use powers of two a lot do.

We’ll see more about these other bases soon.

Here’s a look at the history of logarithms, from 1996:

Logarithms: History and Use I have been asked to explain logarithms from a non-numerical sense to non-math-oriented people. It doesn't seem to be enough for me to show the equation and how it works, they want to knowwhy. Any thoughts? Also, do you haveshort anecdotal historyfor the development of the concept of logarithm? Finally,why is it called a "logarithm"?logos = reason, arithmos = number.

Doctor Anthony answered the history aspect of Linda’s question, starting with two different purposes for logs:

It is a very great economy of effort if we canreduce multiplication to the addition of two numbers. The possibility of adding numbers that can be looked up in tables compiled "forever," as Napier remarked, instead of carrying out a lengthy process of multiplication, was suggested in two ways that were quite independent. The first arose in connection with thepreparation of trig. tablesfor use in navigation. The second was closely connected with the laborious calculation involved in reckoningcompound intereston investments.

Tables of trigonometric functions already existed, and it was found that they could be used as a calculating tool:

In 1593 two Danish mathematicians suggested the use oftrig. tables for shortening calculations. They used the formula: sin(A)*cos(B) = (1/2)sin(A+B) + (1/2)sin(A-B) Thus to multiply 0.17365*0.99027, you look up in tables and find 0.17365 = sin(10), 0.99027 = cos(8) and the above formula gives sin(10)*cos(8) = (1/2)(sin(18) + sin(2)) From tables sin(18) = 0.30902 sin(2) = 0.03490 sin(18) + sin(2) = 0.34392 and (1/2)(sin(18)+sin(2)) = 0.17196 Giving 0.17365*0.99027 = 0.17196

This formula is called the **product-to-sum identity**. Four table lookups and an addition took less work (and was less error-prone) than doing the multiplication by hand.

This device probably suggested toNapier, who is usually called the inventor of logarithms, a simple method for multiplying by a process of addition. Napier had been working on his invention of logarithms for twenty years before he published his results, and this would place the origin of his ideas at about 1594. He had been thinking of the sequences which had been published now and then of successive powers of a given number. In such sequences it was obvious thatsums and differences of indices of the powers corresponded to products and quotients of the powersthemselves; but a sequence of integral powers of a base, such as 2, could not be used for computations because the large gaps between successive terms made interpolation too inaccurate. So to keep the terms of a geometric progression of INTEGRAL powers of a given number close together it was necessary to take as the given number something quite close to 1. Napier therefore chose to use 1 - 10^(-7) or 0.9999999 as his given number. To achieve a balance and to avoid decimals, Napier multiplied each power by 10^7. That is, ifN = 10^7[1 - 1/10^7]^L, thenL is Napier's logarithm of the number N. Thus his logarithm of 10^7 is 0. At first he called his power indices "artificial numbers", but later he made up the compound of the two Greek wordsLogos(ratio) andarithmos(number).

For more details on the word, see here; and here (including how “reason” and “ratio” are related). A given difference in logarithms corresponds to a given **ratio** of numbers.

Napier did not think of a base for his system, but nevertheless his tables were compiled through repeated multiplications, equivalent topowers of 0.9999999Obviously the number decreases as the index or logarithm increases. This is to be expected because he was essentially using a base which is less than 1. A more striking difference between his logarithms and ours lies in the fact that his logarithm of a product or quotient was not equal to the sum or difference of the logarithms. If L1 = log(N1) and L2 = log(N2), then N1 = 10^7(1-1/10^7)^L1 and N2 = 10^7(1-1/10^7)^L2, so that N1*N2/10^7 = 10^7(1-1/10^7)^(L1+L2), so thatthe sum of Napier's logarithms will be the logarithm not of N1*N2 but of N1*N2/10^7. Similar modifications hold, of course, for logarithms of quotients, powers and roots. These differences are not too significant, for they merely involve shifting a decimal point.

So the first “logarithms” weren’t quite what we think of now.

Napier's work was published in 1614 and was taken up enthusiastically by Henry Briggs, a professor of Geometry at Oxford. He visited Napier and discussed improvements and modifications to Napier's method of logarithms.Briggs proposed that powers of 10 should be used with log(1) = 0 and log(10) = 1. Napier was nearing the end of his life, and the task of making up the firsttable of common logarithmsfell to Briggs. Instead of taking powers of a number close to 1, as had Napier, Briggs began with log(10) = 1 and then found other logarithms by taking successive roots. By finding sqrt(10) = 3.162277 for example, Briggs had log(3.162277) = 0.500000, and from 10^(3/4) = sqrt(31.62277) = 5.623413 he had log(5.623413) = 0.7500000. Continuing in this manner, he computed other common logarithms. Briggs published his tables of logarithms of numbers from 1 to 1000, each carried out to 14 places of decimals, in 1617. Briggs also introduced the words "mantissa" for the positive fractional part and "characteristic" for the integral part (positive or negative).

These last two terms were important in using tables, but not so much today.

The first tables of logarithms contained inaccuracies which were noticed and corrected from time to time. The labor expended in constructing them was enormous, and it stimulated the search for better methods of calculating them. This gave a new impetus to the study of infinite series, for example sqrt(2) = (1 - (1/2))^(-1/2) which gives rise to an infinite, convergent series when expanded according to the binomial theorem. This work culminated in the extremely important exponential series, where e = Limit {1 + 1/n}^n as n -> infinity. It is easy to show that e^x = Limit {1 + 1/n}^(nx) generates the series shown below: e^x = 1 + x + x^2/2! + x^3/3! + x^4/4! + ... to infinity, and e = 1 + 1 + 1/2! + 1/3! + 1/4! + .... = 2.718281828... e is now used as the base of logarithms in almost all advanced work.

Modern calculators use something related to this when you push the button. And this number *e* leads us to the **natural logarithm**:

Doctor Anthony mentioned a second purpose for logarithms, related to compound interest. That comes up in this 1998 question:

History and Applications of the Natural Logarithm We have been usingeincontinuous growth problems, where we are examining the growth potential of natural disease populations. I'm so surprised at how often this number comes up in other applications, though.Where did e come from? Who first derived it?Why is it so common in the field of biology?

The question is about the number \(e\approx2.71828\dots\), which is the base of the **natural logarithm**, called \(\log_e(x)\) or \(\ln(x)\) (and often just written as \(\log(x)\) at higher levels, where it is the default).

Doctor Rob answered:

John Napier, the inventor of logarithms, is credited with discovering this constant. Leonhard Euler is credited with popularizing the use of the letter e for this number. There are two reasons for its frequent appearance in mathematical contexts. The first is that it is the limit, as n grows without bound, of (1 + 1/n)^n. This particular definition lends itself to problems involvingcontinuous compoundingof interest. Think of adding 1/n of the current total to that current total n times.

This formula would represent compound interest at a rate of 100%, compounded *n* times a year. As we compound more often, so that our interest earns interest, we earn more; so that with continuous compounding, rather than doubling in a year, the investment would be multiplied by 2.718. In fact, with daily compounding, the amount after a year would be $$\left(1+\frac{1}{365}\right)^{365}=2.714567\dots$$ Hourly compounding would yield $$\left(1+\frac{1}{365\cdot24}\right)^{365\cdot24}=2.718126\dots$$

The result is that when interest on principal *P* is compounded continuously at an annual rate of *r*, the amount after *t* years is $$A=Pe^{rt}$$ The natural logarithm, the log with base *e*, is the inverse function, so to find the time at which a certain amount is obtained, we take the logarithm of the ratio \(\frac{A}{P}\): $$t=\frac{1}{r}\log_e\left(\frac{A}{P}\right)$$

The second is that e is the only constant such that theslope of the graph of y = e^x at 0 is 1; or, in other words, the function e^x is the solution of the differential equation dy/dx = y, with initial conditions y = 1 when x = 0. Similarly, the solution of dy/dx = a*y, with initial conditions y = 1 when x = 0, is e^(a*x).

Here are the graphs of exponential functions with several different bases (2, *e*, 10), showing that the slope of \(y=e^x\) at \(x=0\) is 1:

This is the reason that *e*, and natural logs, are ubiquitous in calculus and related higher mathematics.

The logarithm is the inverse function, obtained by swapping the roles of *x* and *y*:

This is the biological connection. You are talking about theinstantaneous rate of growthof something beingproportionalto the amount of the something. That something can be a population of bacteria in a culture, or a population of cockroaches in a garbage dump. Here, "k" represents the fraction of them reproducing at any given time x. If "k" is negative, it could represent the fraction of organisms dying at any given time.

Unrestricted growth follows an exponential function, derived from that differential equation, namely $$N=N_0e^{kx}$$

This is the model for radioactive decay of radium, uranium, and so on. This is also the approximate model for inflation -- exponential growth.

Radioactive decay is exponential growth is the same, with a negative rate: $$N=N_0e^{-kx}$$

The way to calculate e is by using as many terms as necessary for the accuracy you need in the following infinite series: e = 1 + 1 + 1/2 + 1/(2*3) + 1/(2*3*4) + 1/(2*3*4*5) + 1/(2*3*4*5*6) + ... The terms shown above are already enough to obtain a value for e that is roughly 2.718.

It’s worth pointing out that we can actually use any base we want to describe these growth and decay problems; we’ll see that in another post. But the natural exponential, and the natural logarithm, are what arise directly from the differential equation, and directly display the growth rate.

We’ll close with a 2005 question to tie things together:

Common Logarithms and Natural Logarithms I am currently studying logarithms and I saw that logarithms can take the form oflnorlog. What is the difference between the two? I think it's very confusing because I looked it up in my math book and they statelog e x = ln x. Then they statelog 10 x = log x. I am confused about how e and 10 work with ln and log. I think that if I see a problem such as y = ln(x-1) and the book asks to find the inverse, I have to change it to log, but I don't know how to do that. I simply don't understand ln and log's relationship.

Doctor Tom answered, starting with the **common logarithm**:

Hi Katie, You're confused because it really is confusing! When logarithms are first introduced, it is much easier for students to think about logarithms inbase 10. In base 10, log(.01) = -2 log(.1) = -1 log(1) = 0 log(10) = 1 log(100) = 2 log(1000) = 3 an so on--it just sort ofcounts the zeros, or, more accurately, the log is thepower of 10that creates the number you are taking the log of. Since 100 is 10^2, the log of 100 is 2.

Using common logarithms, we only need a table for values between 1 and 10, because $$\log(a\cdot10^n)=\log(a)+\log(10^n)=\log(a)+n,$$ so we can separately find the “characteristic” (the whole number part) and the “mantissa” (the fractional part) of the logarithm. For example, $$\log(112.23)=\log(1.1223\cdot10^2)=\log(1.1223)+\log(10^2)=\log(1.1223)+2\approx0.05011+2=2.05011$$

Engineersalso tend to use log base 10 for most calculations for the same reason. You can just look at the size and know the magnitude of the number. if the log (base 10) is5.46, without even thinking, you know the number is "between" 10^5 and 10^6, so it'sbetween 100,000 and 1,000,000. So in introductory texts and in engineering books, when you see "log", it usually means "log base 10". If they DO want to talk about other bases, they either put a littlesubscriptafter the "log", like "log_e". (I can't draw a subscript, so I'm using the "_" to mean that the next character should be smaller and written as a subscript).

So \(\log\) is just a nickname for the common log, \(\log_{10}\); and \(\log_{e}\) is the natural log, nicknamed \(\ln\).

Since logbase eis often important for engineers, particularly electrical engineers, they often use "ln" instead of "log_e" since it's quicker to write, and it's a mnemonic for "logarithm, natural", or "natural logarithm". Now formathematicians, the "natural log" really IS much more natural, so since that's the ONLY type of logarithm they use, theyoften just write "log" instead of "ln". I know this seems really confusing, but if you know what you're doing, you can almost always tell that it's a natural log just by looking at the equation. Of course if there is a chance of confusion, everybody always writes them like "log_10" or "log_e" to make it obvious what's going on.

So “log” means the default in whatever field you are in. (This can cause confusion if you look around online, because you may stumble into a world other than that of your class, in which, say, “log” is used where your teacher would use “ln”. Be careful!

Since you are just starting to learn about logarithms, you will always see "log" as meaning "log_10".Computer scientistsoften have a very good use for "log base 2", or "log_2", and it comes up so often there that they often write "lg" instead of "log_2". I've never seen them shorten "lg" or "log_2" to just "log" like the mathematicians, however, so you're never in any danger there.

In fact, it is not uncommon in computer science today for “log” to mean base 2. Wikipedia says,

Many disciplines write log

xas an abbreviation for log_{b}xwhen the intended base can be inferred based on the context or discipline (or when the base is indeterminate or immaterial). Incomputer science, log usually refers tolog, and in_{2}mathematicslog usually refers tolog. In_{e}other contexts, log often meanslog._{10}

When you solve a problem, check how the symbol is used in your source.

We’ll see more about logarithms next time.

]]>

We’ll start with a question from 1997:

Casting Out Nines and Elevens At a parent-teacher meeting this evening, the teacher asked the parentswhy nine is used in proving a math answer. She did not know the answer, and the parents didn't either. Can you help? Our fourth-grade students will be taking this new method of problem solving. EXAMPLE 6313 6+3 = 9 throw away 1+3 = 4 1452 4+5 = 9 throw away 1+2 = 3 7765 7+7+6+5 = 25, 2+5 = 7 If you're left with 7, then your answer is correct.

Susan has found the digital roots of 6313 and 1452 (4 and 3, respectively, whose sum is 7), and of their presumed sum, 7765 (also 7); as we saw last time, this doesn’t actually prove the answer *correct*, but only plausible (that is, it *does not* show that it is *incorrect*).

Doctor Rob answered, introducing ideas of modular arithmetic gradually:

This procedure for checking arithmetic is called "casting out nines," and has been known for some centuries. It is based onmodular arithmetic with modulus nine. Nine is used because it is the largest integer such that it fits into the following pattern: 10 = 9* 1 + 1 100 = 9* 11 + 1 1000 = 9* 111 + 1 10000 = 9* 1111 + 1 100000 = 9*11111 + 1 ... ... The powers of 10 on the left represent the place values of the various digits. The 1 on the far right represents the digit itself.

Now we can use those facts to break down any number.

Your example: 6313 = 6*1000 + 3*100 + 1*10 + 3 = 6*(9*111+1) + 3*(9*11+1) + 1*(9*1+1) + 3 =9*(6*111+3*11+1*1) + 6 + 3 + 1 + 3 6313 - (6+3+1+3) =9*(6*111+3*11+1*1) This shows that the number and the sum of its digitsdiffer by a multiple of nine. This is true for every counting number (positive integer). A consequence of this is that if, in an addition, subtraction, or multiplication problem, youreplace each number by the sum of its digits, you change the result by a multiple of nine.

Here’s a way to see that:

If the numbers \(A\) and \(B\) differ by a multiple of 9, that is, \(A-B=9n\). then

- \((A+C)-(B+C)=(A-B)+(C-C) = 9n\), so the
**sums**\(A+C\) and \(B+C\) differ by a multiple of 9; - \((A-C)-(B-C)=(A-B)-(C-C) = 9n\), so the
**differences**\(A-C\) and \(B-C\) differ by a multiple of 9; - \(AC-BC=(A-B)C = 9nC\), so the
**products**\(AC\) and \(BC\) differ by a multiple of 9;

In each case, replacing \(A\) with \(B\) makes a result that again differs from the original by a multiple of 9.

In particular, then, any sum, difference, or product is changed by a multiple of 9 when you replace each number by the sum of its digits.

The usual notation for this situation is to say that two numbers x and y arecongruent modulo mif their difference is a multiple of m. This is writtenx = y (mod m). In the case at hand 6313 = 6+3+1+3 = 13 (mod 9), and further 13 = 1+3 = 4 (mod 9). Similarly 1452 = 1+4+5+2 = 12 (mod 9), and further 12 = 1+2 = 3 (mod 9). The answer 7765 = 7+7+6+5 = 25 = 2+5 = 7 (mod 9).

Properly, we express congruence using a special “triple-equal” symbol: \(x\equiv y(\text{mod }9)\); that symbol could be used for all the equal signs here.

The word “modulus” is Latin for “small measure”; the modulus was thought of as a unit by which the difference between two numbers can be “measured” (evenly divided). “Modulo” is its ablative form, meaning “*with respect to* the modulus …”. So \(x\equiv y(\text{mod }9)\), read as “*x* is congruent to *y* modulo 9″, means “*x* is congruent to *y* with respect to the modulus 9″, or “*x* and *y* differ by a multiple of 9″.

If the answer 7765 is correct, then 6313 + 1452 = 4 + 3 = 7 (mod 9) and 7765 = 7 (mod 9) provides a degree of checking.If these two numbers disagree, you know you have made an arithmetic error.If they are the same, you have a certain degree of confidence that you have not, although not certainty! For example, a transposition of digits (7675, say) cannot be detected by casting out nines.

What we’ve seen here, expressed fully in terms of modular arithmetic, is that (1) because \(10\equiv 1(\text{mod }9)\), any number (written in base ten) is congruent to the sum of its digits, so that the digital root (obtained by repeatedly adding digits) is likewise congruent to the original number; and (2) addition, subtraction, and multiplication all work mod 9, so that we can replace each operand with anything congruent to it, and the result will again be congruent. Together, this explains casting out nines.

There is a related technique called "casting out elevens" which is based on the following pattern: 1 = 11* 0 + 1 10 = 11* 1 - 1 100 = 11* 9 + 1 1000 = 11* 91 - 1 10000 = 11* 909 + 1 100000 = 11* 9091 - 1 1000000 = 11* 90909 + 1 10000000 = 11*909091 - 1 ... ... 6313 = 6*1000 + 3*100 + 1*10 + 3 = 6*(11*91-1) + 3*(11*9+1) + 1*(11*1 - 1) + 3 = 11*(6*91+3*9+1*1) - 6 + 3 - 1 + 3 6313 - (3-1+3-6) = 11*(6*91+3*9+1*1) It uses thealternating-sign sumof digits, working *right to left*: 6313 = 3 - 1 + 3 - 6 = -1 = 10 (mod 11) 1452 = 2 - 5 + 4 - 1 = 0 (mod 11) 7765 = 5 - 6 + 7 - 7 = -1 = 10 (mod 11) The fact that 10 + 0 = 10 (mod 11) gives an additional check on the validity of the arithmetic.

Replacing each number with its elevens-root (I don’t know a standard word for this) means we are “doing the arithmetic modulo 11”; if the result doesn’t agree (modulo 11), then one of the calculations is wrong!

Casting out elevens is much less well known than casting out nines. It WILL detect digit transpositions.

But it will still miss transpositions between digits an even number of places apart, or other errors!

Now, can we explain all this without the complexity?

A 2004 question makes a big request:

Casting Out Nines for 2nd Graders Dear Dr. Math, Can you please send me a simpler explanation of WHY the Casting Out Nines method works than you have in your archives? I understand how to do it, but I need to know why it works. I have to present this methodso that second graders can understand why it works.

I answered:

Hi, Donna. I've tried a number of times to find a good way to prove the method at an elementary level, and found thatany really clear explanation requires some knowledge of modular arithmetic and algebra. Without those ideas, there's just too much work needed to work around them. But that's talking mostly aboutproofs. Second graders wouldn't appreciate an actual proof anyway, so we can just look for aplausible explanationto show HOW it works -- what's going on behind the scenes. That may be easier to handle, though we'll have to keep it extremely simple.

The first step is to boil the idea down to its essentials at an adult level.

The basic idea, leaving out all the details, is this:Adding the digits of a number decreases it by a multiple of 9, so repeating the process until you have a single digit leaves you with theremainderafter division by 9 (or with 9 if that remainder is 0). (In advanced terms, the digit sum iscongruentto the original number modulo 9). When numbers areadded or multiplied, the remainder of the result is the same as the sum or product of theremaindersof the given numbers. (In advanced terms, the sum and product arewell-defined in modular arithmetic; that is, theypreserve congruence.) So if the remainders on both sides of an equation arenotequal, then neither are the values of both sides themselves.

Now we want to take out the hard language:

For second graders, we'll want to work with a specificexamplerather than with generalities, and withadditionrather than multiplication. Take this sum as our example: 24 + 37 = 51 Is this correct? Well, we look at the left side, the sum. We add the digits of 24 and get 6; we add the digits of 37 and get 10, then add again to get 1. Now we add 6 and 1 to get7. On the right side, we add the digits of 51 to get 6; since this is not equal to7, our answer is wrong. (The right answer is 61, which does give a digit sum of 7.)

(Possibly it would be better to use an example of a *correct* calculation.)

Now, what is happening when we do that? Instead of adding 24 + 37, we're adding 6 + 1. Now, 6 is 18 less than 24, and 1 is 36 less than 37, so their sum is 18 + 36 less than the real sum. Do you see that 18 and 36 are bothmultiples of 9? That means that the sum we get, 7, is some number of 9's less than the real sum. This will always happen. [Why? Because replacing 20+4 with 2+4 takes away 20 and adds 2, which is the same as taking away 10 and adding 1 twice. But that means taking away a multiple of 9.]

Changing 10 to 1 reduces it by 9; changing 20 to 2 reduces it by twice as much. Every time, we are taking away a multiple of 9.

When we do the same thing to the 51, the sum of the digits, 5+1=6, is 45 less than 51 itself; so again we get a number that is some number of 9's less than the actual number. But that means that the numbers we get on the two sides should themselves bea multiple of 9 apart. In fact, since they are both single-digit numbers, they should be equal (unless one is 0 and the other is 9).Since they are not, the numbers can't be right.

This is an interesting way to see that 0 and 9 are equivalent here – the reason we call it “casting out nines”.

I'm not at all sure this will work for the average second grader; even though I avoided going into detail about place value and remainders, I had to bring inmultiples of 9. It's hard to avoid something like that, since that is what the technique is all about under the hood: multiples, remainders, modular arithmetic. Please let me know if you find a nice way to express these ideas for that age.

My real hope was that this would at least be comprehensible to the adults trying to explain!

Donna wrote back:

Thank you very much for the speedy reply. My presentation is tomorrow. Maybe the other educators in my group will have a suggestion for explaining it to second graders. I'll let you know if they do. Again, thank you!!!

We didn’t hear back.

A question in 2008 gave me another chance at a basic “proof” at a slightly higher level:

Why Casting Out Nines Works Hello. I have to do a math project for school, and one of the things I must know iswhythe method of "casting out nines" works. I know it has to do withmodular arithmeticbut can you please explain the math behind this method AS SIMPLY AS POSSIBLE. I would like it if you could explain itin simple terms, not college math. NOTE: I know how to do this method, so please do not include that with your response. I cannot find any place that clearly defines why this method works to me, because I don't understand what the web site is trying to say. I have tried reading about modular arithmetic but that is only part of the reason why this works and I really need someone to explain it loud and clear to me.

It’s possible that Las needed a clear explanation of the modular arithmetic as used in the proof; but it sounded like an explanation without that, and even without algebra, would be useful. I answered:

Hi, Las. Modular arithmetic provides alanguagein which it is easy to explain what is happening; without that language,it will take more words to make it clear(and even more if I wanted to give a complete proof), but it can be done. The basic idea of modular arithmetic is that two numbers are congruent modulo 9 when theyleave the same remainders on division by 9; you'll be seeing phrases like that all through what I write here!

What I’m trying to do is to translate the modular arithmetic language into the language of remainders. We’ll see if it works. I took the same general approach as above, using an example rather than lots of symbols.

Let's take a simple example; I'll check the addition 157 + 246. This way we can avoid using a lot of variables, but you can follow the ideas and see that they apply to any number. (This is how people talked about algebrabefore they had the idea of variables, so I'm following an old tradition.)

[As an example, here is how Al-khwarizmi explained how to solve \(x^2+bx=c\) (one of several cases of quadratic equations, when no negatives are allowed) by completing the square, using the example of \(x^2+10x=39\):

Symbols, like special words, are useful, though it is possible to do without them! But that’s just an aside …]

First, we’ll just do the check by casting out nines:

The check digit for 157 is 1+5+7 = 13, 1+3 = 4; and the check digit for 246 is 2+4+6 = 12, 1+2 = 3. The first thing to ask is,what do these numbers mean?The answer is, the check digit gives theremainder when you divide by 9. In our example, when you divide 157 by 9 you get 17 with a remainder of 4 (our check digit), and when you divide 246 by 9 you get 27 with remainder 3 (again our check digit).

Why is that?What canadding the digitsof a number have to do with dividing and getting aremainder? Well, let's look at157, which we can write as1*100 + 5*10 + 7. Notice that 10 = 9+1 and 100 = 99+1;any power of ten is 1 more than a multiple of 9, and therefore will leave a remainder of 1 when divided by 9. So we can rewrite 157 as 157 = 1*100 + 5*10 + 7 = 1(99 + 1) + 5(9 + 1) + 7 = (1*99 + 5*9) + (1 + 5 + 7) Since the first part is a multiple of 9, the second part (the sum of the digits) will have thesame remainder; whatever remainder you get when you divide 1+5+7 by 9 is the remainder when you divide 157 by 9.

The number itself can be broken into a multiple of 9, plus the sum of the digits. Any two numbers that differ by a multiple of 9 (that is, that are congruent modulo 9) leave the same remainder.

So we can repeat the process with 1+5+7 = 13, adding its digits and again getting the same remainder.Once we get this down to one digit, it IS the remainder.So what have we learned? Thecheck digits are remainders; now we have to considerwhat happens to remainders when you add numbers. This will take more of the same kind of thinking.

Take our two numbers, 157 = 9*17 + 4 and 246 = 9*27 + 3. I've written each as a multiple of 9 plus its remainder (which is its check digit). Now let'sadd them: 157 + 246 = (9*17 + 4) + (9*27 + 3) = (9*17 + 9*27) + (4 + 3) = 9*(17 + 27) + (4 + 3) So the remainder when you divide the sum by 9 is the sum of the remainders, 4+3 = 7. Well, not always--if the sum of the remainders had been greater than 9, you'd have to divide by 9 again and take the final remainder. But in all casesthe remainder of the sum is the same as THE REMAINDER OF the sum of the remainders. This is where the language of modular arithmetic saves a lot of words!

In terms of modular arithmetic, the sum of the two numbers is congruent to the sum of the check digits.

This means that thecheck digit for the sumis the same as thecheck digit for the sum of the check digits--and that is what casting out nines is. If you find that this isnottrue, you know that the sum isincorrect.

Similarly, we find that the **product **of two numbers is congruent to the **product** of their check digits. Using the same numbers as an example,

$${\color{Red}{157}}\cdot{\color{Green}{246}}=(9\cdot17+{\color{Red}4})+(9\cdot27+{\color{Green}3})\\=(9^2\cdot17\cdot27+9\cdot17\cdot{\color{Green}3}+{\color{Red}4}\cdot9\cdot27)+{\color{Red}4}\cdot{\color{Green}3}\\=9\cdot(9\cdot17\cdot27+17\cdot{\color{Green}3}+{\color{Red}4}\cdot27)+{\color{Red}4}\cdot{\color{Green}3}$$ so the product leaves the same remainder as the product of the check digits, 12, namely 3. And, in fact, \(157\cdot246=38,622\), whose check digit is \(\require{cancel}\cancel{3}+8+\cancel{6}+2+2=12;\;\;1+2=3\)

Las replied:

Thank you so very much. This was the most helpful thing any web site did for me. I have a feeling I'm going to get a 100 on this math project! Thanks again for taking your time to help me out. Without you, I wouldn't understand this.

So, this time, it worked.

]]>

We’ll start with a question from James in 1998:

Casting Out Nines I am trying to find a reference which defines this mathematical operation ["casting out nines"]. I have looked in multiple texts in library and book stores without success. A definition would be a start for me to try and understand the concept. Thanks

Doctor Rob answered:

Casting out ninesis the name of a technique forchecking arithmetic. It depends for its use on the idea of thedigital sumof a number. The digital sum of any positive integer (or whole number) is gotten byadding up all the digitsof the number. If the result has more than one digit, repeat this,until the result is a one-digit number. That digit is the digital sum of the starting positive integer. Example: 9974 -> 9+9+7+4 = 29 -> 2+9 = 11 -> 1+1 = 2, so 2 is the digital sum of 9974. Let's write s(9974) = 2. If you are familiar with modular arithmetic, the digital sum of a number is thesmallest nonnegative representative of its congruence class modulo 9.

Another way to say that last bit is that the digital sum turns out to be the **remainder** (almost) when you divide the number by 9. For example, if you divide 9974 by 9, you get a quotient of 1108 and a remainder of 2, because \(9974=9\times1108+2\). Adding digits is a quick way to find this remainder.

The number that is here called the **digital sum** has several names; the most formal term is **digital root**, which clarifies that it is more than just a simple sum; I’ll later use **check digit** for simplicity.

Now the important facts about digital sums and arithmetic are that: s(a+b) = s(s(a)+s(b)), s(a*b) = s(s(a)*s(b)).

The idea here is that the digital sum is **preserved** by addition and multiplication: **The digital sum of a sum is the digital sum of the sum of the digital sums of the addends**, and the same for multiplication. It’s much harder to say than to do!

We use this to checkadditionand multiplication as follows: 9974 + 2348 ?=? 12422. s(s(9974)+s(2348)) = s(2+8) = s(10) = 1, s(12422) = 2. This means that the sum given is incorrect.

So if we think that \(9974+2348=12,422\), we can check the digital sums and see that it can’t be correct, because if it were, the digital sums of both sides would be the same. (In fact, \(9974+2348=12,322\). One digit was wrong.)

Similarly, for **multiplication**:

9974*2348 ?=? 23418952. s(s(9974)*s(2348)) = s(2*8) = s(16) = 7, s(23418952) = 7. This means that the product given is likely to be correct.

We thought that \(9974\times2348=23,418,952\), and the digital sums are **compatible** with that claim. But it doesn’t prove we are actually right:

This kind of checking will findmany errors, butnot all! An interchange of two digits (23418952 vs. 23419852) will not be detected, and replacing a 9 by a 0 or vice versa will not be detected.

Swapping digits will not change the digital sums, so the answer stills looks like it could be right. As we’ll see, if we’re wrong, we have an 11% chance of catching the error.

To checksubtraction, use the fact that a - b = c means a = b + c. To checkdivision, use the fact that a/b = c means a = b*c. To deal withzero, you can define s(0) = 0. To deal withnegativenumbers, you can define s(-a) = 9 - s(a).

We’ll see examples of subtraction and division below. When one of the numbers is zero, we don’t really need to check; but it’s nice to know what to do; when one is negative, we usually deal with signs separately. We’ll see the last comment, about negatives, below when we check subtraction directly.

That got a little complicated. Here’s another question, from 1999:

Casting Out Nines to Check Arithmetic My teacher was talking about casting out 9's. She said it was the easiest way to check the problems.No one in the class understands this method. She is a new teacher, we're in the 6th grade, and she was teaching high school math. Thank you, Jason Fowler

I fully understand the difficulty of explaining this at a fully elementary level! I first referred to the answer above and to an explanation we’ll see next time:

Hi, Jason. Casting out nines is not high-school math if all you want to do isuse it; but it can take some effort toexplain why it workswithout getting into hard stuff. Here are two explanations of it in our archives, both of which get into some deeper ideas than you want by way of explaining how it works: Casting Out Nines and Elevens http://mathforum.org/library/drmath/view/55805.html Casting Out Nines http://mathforum.org/library/drmath/view/55831.html I'll give you a quick explanation of how to do it, without the big words.

We’ll see *why* it works next time, again at elementary and advanced levels.

First, for any number we can get a single digit, which I will call the "check digit," byrepeatedly adding the digits. That is, we add the digits of the number, then if there is more than one digit in the result we add its digits, and so onuntil there is only one digit left. For example, for the numbers 6395 and 1259, we get: 6395 --> 6 + 3 + 9 + 5 = 23 --> 2 + 3 = 5 1259 --> 1 + 2 + 5 + 9 = 17 --> 1 + 7 = 8

My term “check digit” is commonly used in computers, where for example UPC or ID numbers often include an extra digit that checks whether the others are copied correctly. I used this general term to keep things simple.

Now, it turns out that if you add or multiply a set of numbers,the check digit of their sum is the same as the check digit of the sum of the check digits. You can think of it like this: numbers -----> digits | check | | (1) |add | |(2) | V |addsum-----> digit |(4) check | | (3) |equal? | (5) | (6) V check Vsum-------------------> digit That is, if you calculate the check digit (1) of each number you're adding, thenaddthese (2) and calculate the check digit of thesum(3), that should be the same as the check digit (5) of thesum(4) you are checking.

You don’t need to use the diagram, but it was the best way I could think of to picture the process.

In our example, the sum of the numbers is 6395 + 1259 ------7654--> 7 + 6 + 5 + 4 = 22 --> 2 + 2 = 4 with check digit 4, and the sum of their check digits is 5 + 8 ---13--> 1 + 3 = 4 So the check digit of the sum is 4, and the check digit of the sum of the check digits 5 and 8 is also 4. If they didn't agree, we'd know something was wrong. Here's my diagram: 6395 -------> 5 1259 -------> 8 | check | | |add | | | V |add13-------> 4 | check | | |equal? yes! | | V check V7654------------------> 4

The same routine works when you multiply:

Similarly, for multiplication, the product of the numbers is 6395 x 1259 -------8051305--> 8 + 0 + 5 + 1 + 3 + 0 + 5 = 22 --> 2 + 2 = 4 and the product of the check digits is 5 x 8 ---40--> 4 + 0 = 4 which agrees with our product: 6395 -------> 5 1259 -------> 8 | check | | |mult | | | V |mult40-------> 4 | check | | |equal? yes! | | V check V8051305----------------> 4 (You wouldn't normally get the same check digit for the result of the sum and the products; I just picked a weird example.)

You can also apply the process to subtraction and division, but because of some special cases you have to deal with, it's easier to transform the problem to an addition or multiplication. For example, to check thesubtraction: 7654 - 6395 ------ 1259 you would transform it tothe addition I did above. To check thisdivision: _____6395_rem 342 1259 ) 8051647 you would transform it tothe multiplication I did above, by adding the check digit of theremainderto theproductof the check digits of the quotient and the divisor, and checking whether this is equal to the dividend: 6395 --> 5 x 1259 --> x 8 ------- --- 8051305 40 --> 4 + 342 ----------> + 9 ------- --- 8051647 ---> 4 <--- 13 In other words, you apply casting out nines not to the subtraction or division itself, but to thestandard check, in which you reverse the problem by adding the subtrahend to the difference or multiplying the quotient by the divisor.

So the division $$8051647\div1259=6395\text{ R }342$$ is equivalent to the equation $$1259\times6395+342=8051647$$ so we need to check the multiplication and addition together in order to check the division.

Note also that casting out nines only works to check an **exact calculation**, including the **remainder**; it can’t check a decimal or rounded answer.

If the check digits don't come out right, you must have made a mistakein your arithmetic (either in the problem you're checking, or in calculating the check digit); butif the check digits agree, your work could still be wrong, such as if you switched two digits when you were copying. In fact, I use a variant of this method when I balance my checkbook. If I get the wrong balance, I know my calculator didn't add wrong, but I may have entered something wrong. If the check digit for my balance is the same as what the bank says, I can guess that Ireversed two digitssomewhere; if they are different, it's more likely that Idropped a digit, or perhaps a wholetransaction.

I don’t do that much any more, but it can still be worth doing!

If you want to knowwhat this has to do with nines, orwhy it works, check out the other answers I referred to above. The basic idea is that the check digit is essentially theremainder after you divide by 9. (A slightly more advanced way of working with these check digits is totreat a result of 9 as a zero, so that check digits are always between 0 and 8 rather than 1 and 9, making it a genuine remainder.)

This is why it’s called **casting out** nines:

You may notice that when you add the digits of 6395, if you justignore the 9, and the 6+3 = 9, you still end up with 5 as your check digit. This is because any 9's make no difference in the result. That's why the process is called "casting out" nines. Also, atany stepin the process, you can add digits, not just at the end: to do 8051647, I can say 8 + 5 = 13, which gives 4; plus 1 is 5, plus 6 is 11, which gives 2, plus 4 is 6, plus 7 is 13 which gives 4. I never have to work with numbers bigger than 18.

This “running check” is very useful when you do this all in your head.

I hope this clarifies what you're doing. It takes a lot of words to explain, but it's really easy to do. Keep at it and you'll get the idea. If you want a simpler explanation of WHY the method works than we have in our archives, write back and I can send that to you too.

We’ll see that next time.

A 2003 question focused on subtraction, and was added to this page:

Dear Dr. Math, I don't understand casting out nines. Can you help me? Here is an example 3942 - 1581 ------- I know that it equals 2361 I think that you would add the numbers together and you would throw out the nines, so it would be 3942 (0) - 1581 (6) so what would the answer be? (3?) How do you get that though ?

Note that Sarah didn’t just add \(3+9+4+2=18, 1+8=9\), but followed the full “casting out nines” procedure, replacing the final single digit of 9 with 0, which is the actual remainder when you divide this number by 9. More on that below!

I suspect that using the method to check subtraction is not taught as often as for addition and multiplication.

I answered again:

Hi, Sarah. I would usually check a subtraction bychecking the equivalent addition; here 2361 + 1581 = 3942. In this case, the reduced digit sums are 3, 6, and 0, which is correct since 3+6=9 which reduces to 0.Butyou can cast out nines tocheck subtraction directly, if you add one step. In this example you want to subtract 6 from 0. To make the 0 big enough to subtract, you can"borrow" a 9, by adding 9 to the zero. (Just as you reduce by casting out nines, you can grab an extra nine when you need one.) Now you are subtracting 6 from 9, which gives you the 3 you expect.

Just as 9’s can be **ignored** because they are equivalent to zero, you can **add in** a 9 when the result would otherwise be negative. This is the point of the last line in Doctor Rob’s answer above.

A 2001 question reveals a “flaw”:

When Casting Out Nines Fails My son is doing Abeka 6th grade math. I was intrigued with checking math by casting out 9's. I introduced this technique to a friend. Just to prove that casting out 9's worked, I intentionally used anincorrectanswer (quotient). I was shocked when the problem checked out correctly. Here is the problem. Divisor 6, Dividend 5223, and Quotient 875. Have I done something wrong? I know the correct answer is 870 R. 3. Really stumped, Teresa Miller

I’m reminded of a story my father used to tell about a demonstration he did in a marketing class in college. As part of his sales pitch for a newfangled self-sealing car tire, he hammered a nail into a tire – which promptly went flat. He had failed to learn that it only worked when the tire was warm from driving, so his demo failed. Here, the demonstration of casting out nines “failed”, because Teresa didn’t know all about what it really does.

I answered:

Hi, Theresa. It's not at all surprising that casting out nines would fail; it puts all numbers into one of 9 categories, so it will catch an error that puts it into one of the 8 wrong categories, but not one that happens to land it back in the "right" category. This meansthere is a 1 in 9 chance that a random error will look okay. When the method is taught, it should always be pointed out that it can tell you if an answer is wrong, but can't be trusted to tell you that it is right!

The logic of the method is this: If the result is correct, the check digits will agree, so if the check digits disagree, the result must be incorrect. This says nothing about what is true if the check digits agree! (See Why, in Logic, Does “False” Imply Anything?)

In your example, your check should look something like this: Claim: 875 * 6 + 0 = 5223 Check: 8 * 6 + 0 =? 3 48 3 =? 3 (There are different ways to arrange the work, of course. I calculated the digit sums, then calculated the left side as 48 and found its digit sum to be 3.)

The check digit of the multiplication is \(4+8=12,1+2=3\); the check digit of the addition is then \(3+0=3\). The check digits match though the result was wrong.

The correct answer, checked, would be Claim: 870 * 6 + 3 = 5223 Check: 6 * 6 + 3 =? 3 39 3 =? 3 You have added 5 to the quotient, which adds 5*6 (or 3) to the digit sum; and you dropped the remainder of 3, which subtracted 3 from the digit sum, so you came out even.

The check digit of the multiplication is \(3+6=9\); the check digit of the addition is then \(9+3=12,1+2=3\).

We’ll close with one more explanation, for a fifth grader in 2009:

Casting out Nines: How Is 9 the Same As 0? Why Bother? Dear Dr. Math, I'm homeschooled, and in 5th grade, and am having trouble casting out nines. I've read the directions, and all your other information, andit's too confusing. My parents don't get it either. Also,what's the use?One of my problems is 7326 + 5037 + 2765 + 9932 + 8416.I don't get how 7 + 3 + 2 + 6 = 0, which is the answer that my Teacher's Edition gives.

I answered:

Hi, Samantha. Casting out nines may not be quite as useful today when everyone uses calculators; but then,it can catch errors in entering the data into the calculator, so it might not be a bad idea to use it more than we do! Casting out nines is a fascinating (and rather old) method for checking manual calculations. If you look into why it works, you'll see that it ties into some important bits of number theory. Since I may well have written some of what you wrote that was too complicated, let's try just using your example rather than trying to explain it fully in general terms.

We need to keep it simple, and I’ve learned that examples often do that. Sometimes, too, I find that the specific problem a student asks about has a unique issue I wouldn’t have noticed.

The basic idea is that we can make what is called a "digital root" by adding all the digits of a number, and then repeating the process until we have a single digit. Let's do that with each of your numbers, and also with their sum: 7326 -> 7 + 3 + 2 + 6 = 18 -> 1 + 8 = 9 5037 -> 5 + 0 + 3 + 7 = 15 -> 1 + 5 = 6 2765 -> 2 + 7 + 6 + 5 = 20 -> 2 + 0 = 2 9932 -> 9 + 9 + 3 + 2 = 23 -> 2 + 3 = 5 + 8416 -> 8 + 4 + 1 + 6 = 19 -> 1 + 9 = 10 -> 1 + 0 = 1 ------ 33476 -> 3 + 3 + 4 + 7 + 6 = 23 -> 2 + 3 = 5

Do you notice something?

Before we continue ... I now see the specific question you are asking. For the first addend, above, I wrote 7 + 3 + 2 + 6 = 18 -> 1 + 8 = 9 Your book says it's 0, not 9. Why? This is actually where the name "casting out nines" comes from. If you work with this method enough, you find thatanywhere you have a 9, you can just "cast it out"(throw it away) because it doesn't affect the digital root. For example, for the number 19, you get 1 + 9 = 10 -> 1 + 0 = 1, which is what you'd get if you ignored the 9 in the first place.

We just skip the 9, knowing it’s a waste of time.

But how can 0 and 9 really be the same answer, in our specific case? That's becauseall this work is based on the remainder you would get if you divided a number by 9. For example, if you divide 19 by 9, you get 2 with a remainder of 1, and 1 is the digital root! So the digital root is the remainder when you divide by 9 ... except when you get a 9! The remainder has to be less than the divisor. So when you get 9, in order to really find the remainder, you have to divide by 9 again -- and now the remainder is 0. So when we cast out nines, we treat 0 and 9 as the same thing; if we get a 9 we can "cast it out" and use 0 as the digital root.

Technically, we say that 9 is *congruent to 0*, modulo 9; that is, as far as we are concerned, they are the “same” number. And since we want the digital root to be less than 9, we just write 0 when we get 9. But, as we’ve seen above, we often don’t actually bother with that.

Back to our process. The digital roots of the addends are now 0 + 6 + 2 + 5 + 1 = 14 We take the digital root yet again, since this has two digits; and find that the digital root of the sum is 5. If we had gotten a different digital root, we'd have known that we made a mistake. But since this agrees with the 5 we got by using the sum itself, we'veconfirmedour addition.

Remember, again, that “**confirm**” here doesn’t mean **prove**; it just gives supporting **evidence** (that is, it **doesn’t disprove** the claim):

Now, having the same digital root doesn't prove the answer is right -- we could have added wrong but gotten the same result by coincidence -- but it does give us more confidence. I hope that helps a bit. And I hope your text explained this idea of ignoring 9's, and didn't just tell you to add digits, as I did at first. That simplified explanation is actually good enough to use the method (it would have worked fine if we had used 9 rather than 0 for that first digital root). But ignoring 9's makes the work a bit easier.

Next time, more on *why* it works.

The first three questions were almost the same (and almost simultaneous, but independent).

First:

I wanted to ask:

which is more accurate? Law of cosine or law of sine? And why?

Second:

Which law is more accurate, law of sine or cosine?

Third:

Why is Law of Cosines more accuratethan Law of Sines?

I gave all three the same answer, presuming the question arose from a class discussion, and asking for clarification:

Three of you wrote with this same question; but in order to answer it, I need to know a little more about the question.

First,

what do you mean by “accurate”?Second,

whydo you think either law would belessaccurate?I can think of ways in which a particular application might call for

choosingto solve either a sine or a cosine, because the other would be either a little more work, or likely to produce an inaccurate answer due to rounding; but it is not thelawitself that is inaccurate in such a case, but the manner ofapplicationto a particular problem may be inappropriate.Some of this is touched upon in the following posts, in discussing which laws to apply, in what order, and to what parts of a triangle:

Solving an Oblique Triangle, Part I

Solving an Oblique Triangle, Part II

The last example in the second, particularly, makes mention of

inaccuracy, and achoiceto be made, but similar issues arise elsewhere.I’ll be happy to discuss these details, if they will help you. Can you tell me a little about where the question arose, and how my answer will be used? Is this a question of personal curiosity, or a class assignment, or something else?

The third questioner, Irene, responded first, telling me about the context:

I think the other 3 are in my class, you see we were solving questions that required Law of Sines and Law of Cosines. When our teacher told us that

he googled which was more accurateand it said “Law of Coines” but didn’t give him a clear answer of why, so he told us if we could find the answer then he would give us extra credit. He particularly recommended this page. But for your question I’m not too sure myself, but I assume when he said “accurate ” he meant it as inreal-life scenarioswhere people need to use the exact right numbers in order to build a stable, secure building of some sort.Thank you for your time.

In the old days, teachers would sometimes assign a class to use *Ask Dr. Math* to answer a question, leading to a flood of the same question; we encouraged teachers to have the class write one question together (or just search for existing answers). That’s rarer today.

Knowing a little more about the context, I responded more deeply:

Thanks. I thought it sounded like a result of a class discussion.

I’d like to see the source that said the law of cosines is more accurate, and what they mean by that. In my opinion,

both laws are exact in themselves; but how youapplythem might result in error, and sometimes a different choice ofwhat parts you actually measure(and therefore what method you use to solve) can make a big difference.

“Real-life” is going to be a key: If we can choose what to measure, we can improve the accuracy of our results.

I see several issues that might cause errors in solving a triangle:

- Actual errors in the work, such as taking an inverse sine and forgetting to consider
both possible angles.- Using
rounded intermediate values, which can cause larger errors in subsequent steps; this can be avoided by following recommended procedures that encourageusing known valuesrather than calculated values when possible, and by using stored values in the calculator rather than copying and reentering results.- Taking
inverse sines or cosines of numbers near 1(resulting in angles near 90 or 0 degrees, respectively), because, for example, the cosine changes very slowly near 0, which magnifies errors in the inverse cosine. A small rounding error there can cause relatively large errors later.Only the last of these really involves which law you use, and the usual procedures should preclude these error-prone situations. Often there is only one real choice anyway, but my two posts demonstrate some situations where you do have a choice, and may sometimes want to take these issues into account.

If you consider the graph of the inverse sine, you can see that it is very steep near \(x=1\), so that a small error in the input can result in a large error in the output (here, from \(x=0.96\) to \(x=0.98\)):

The last example in Solving an Oblique Triangle, Part II, as I mentioned, talks about these things, and in particular the student compared two approaches, one starting with the Law of Sines, and the other with the Law of Cosines. In that case, the latter gave the “correct” answer, but that was largely due to the problem being bad! But we also showed the risk in taking the inverse sine to find an angle close to 90°.

If your teacher happens to tell you more about the

source of the claim, or the particulars of their concerns, we may have more of a discussion. I wish I were there in your class!

Irene replied:

Thank you so much for this response, my teacher and I really appreciate it! I’ll follow up with any more questions if he has any more. Thank you for your time!

In fact, she will.

The second questioner, Haroon, then said a little about the meaning of “accurate”:

This is a

class assignmentto find out which law is more accurate, meaning which law is to theexactpoint instead of anapproximate. That is what I understand or, to put it in other words, which law is better and gives the most right answer.

I responded:

The quick answer is that

both are exact, not approximate. They aretheoremsthat have beenproved:They are also both

necessary; there are some problems that require one, some that require the other, and some that require usingboth(though sometimes either could be used). We can’t always decide to use one rather than the other, even if one were more accurate.But they are both

toolsthat we can use in various ways, and can give inaccurate results because we can’t (usually) evaluate sines and cosines and their inverses exactly. It is necessary to be aware of the accuracy of your calculator, and of how precision can be lost in subsequent calculations.Finally, I am unaware of any tendency for either to be more sensitive to errors than the other.

So in the end, the answer is,

neither is better or more accurate.

We’ll need a specific example to make things clear.

Almost a month later, Irene provided just that:

Following with the same question, my teacher did this problem and got 2 different answers. Let’s say if someone is making a rocket ship, which one would they pick to launch someone at the right amount so they don’t go too far or not far enough? You have mentioned that they are both the same but

when we calculate this we get 2 different answers.

The first solution to the problem shown at the bottom uses the Law of Sines applied to angle *A* and sides *a* and *b*. The second solution applies the Law of Cosines to sides *a*, *b*, and *c*.

Now there was much more for me to say:

Thanks. This is very similar to the last example in Solving an Oblique Triangle, Part II, which I’ve mentioned already, titled “When everything goes wrong!”. This problem is

overspecified(that is, there is more data than we need), and the difference in results comes from usingdifferent subsets of the data, which themselves areinconsistent.You are given

four numbers; we only needthreeto specify a triangle.If I assume the side lengths are correct(which, as I mentioned in the post, I tend to trust more), I getA = 85.2198°, not exactly 85°. So the angle has been rounded.

If the sides are measured correctly, then the given angle is not quite accurate.

So in your first method, which uses A = 85°, a = 8, and b = 7, you get a result that is a little off due to the rounded value of A. You don’t use c = 4.5 at all. If we then calculate the length of c, we find it to be

4.53, not exactly 4.5.

The Law of Sines gives an inexact answer because it uses an angle that is inexact. If, instead, we trust the data used here, then the side we didn’t use turns out not to be the exact length shown.

(This SSA triangle happens to be unambiguous. If it weren’t, we could use the extra data to decide which to choose.)

In your second method, you use only a = 8, b = 7, and c = 4.5, and not the presumably rounded value of A, so your 60.6868° is

presumably the correct valuefor angle B. But if it should turn out that the 4.5 measurement was rounded and A was exact, then the first answer would be better.

The two answers are different, not because one *method* is inaccurate, but because the *data* are inconsistent.

Here are the two triangles: ABC by SSS (method 2), and AB’C by SSA (method 1).

At the bottom of the same post, I make comments about “real life”, where you tend to have, on one hand,

too much data, and on the other,inaccurately measured data. That will make itinconsistentas well. Then you need tochoose the best-measured data, and perhaps also calculate using other data andaverage different results.

We can either make a judgment as to which data we trust most, or use different subsets of data and average out their differences.

Now, I find your question interesting in light of what I just said. Your example of a rocket is a real-life question to which my last comments apply! If this were such a life-and-death problem, then you would have to decide whether your angle was measured more accurately than the distances, and

make the best use of the data you have.But the important thing is that it is not a matter of whether using

sines or cosinesis more accurate, but of which of thedataare more accurate.

Meanwhile, a fourth student had written the day after the first three, with a slightly different perspective:

In my math class we were discussing that

law of cosine was generally more accuratethan law of sine. My teacher knew this, but when he googled it he couldn’t find outwhy. He challenged us to research and find out why this is the case. Why is law of cosine more accurate than law of sine?

I answered:

Hi, Madison.

I don’t see that either is “more accurate” than the other, They’re just two tools that can be used to solve triangles, and sometimes choosing one over the other might happen to be wiser (if you even have a choice).

Can you tell me what

evidenceyour class, or your teacher, has for this claim? We can’t talk about “why” until we agree on “what” is true.I’d be interested to see

what particular problem(or kind of problem) you were discussing when you decided the law of cosines is better.

Madison responded:

Hello,

I think our teacher said he learned it in college and tried to google it, but could only find some ChatGPT answer saying law of cosines is more accurate, but not why. It seems more likely to me now that his claim might be incorrect. This is the only thing I can find on the topic, and I am not sure what my teacher read, but since it’s the only source I can find, I question its validity:

Quora: Why is the law of cosines more accurate than the law of sines?

This gives an overspecified example like ours above, and mentions both the ambiguous case and values near 1. (We’ll look at ChatGPT later.)

Madison also copied a pair of problems that apparently raised the question. First:

7-99: A bridge is being designed to connect two towns (one at point A and the other at point C) along the shores of Lake Toftee in Minnesota. Lavanne has the responsibility of determining the length of the bridge.

Since he cannot accurately measure across the lake (AC), he measures the only distance he can by foot (AB). He drives a stake into the ground at point B and finds that AB = 684 ft. He also uses a protractor to determine that \(m\angle B=79^\circ\) and \(m\angle C=53^\circ\). How long does the bridge need to be?

This is an AAS problem, solved by the Law of Sines: $$\frac{x}{\sin79^\circ}=\frac{684}{\sin53^\circ}$$ $$x=\frac{684\sin79^\circ}{\sin53^\circ}=840.725$$

Second:

7-100: Lavanne is not convinced that his measurements from problem 7-99 are correct. He decides to calculate the distance again between towns A and C using a different method to verify his results.

This time, he drives a stake in the ground at point D, which is 800 feet from town A and 694 feet from town C. He also determines that \(m\angle D=68^\circ\). Using these measurement, how wide is the lake between points A and C?

Does this process confirm the results from problem 7-99?

This is an SAS problem, solved by the Law of Cosines: $$x^2=800^2+694^2-2(800)(694)\cos68^\circ=705,672.839$$ $$x=\sqrt{705,672.839}=840.043$$

The two answers are close, but round to different numbers of feet. How close do they have to be to confirm the answer?

Madison said,

For #99 we got 840.7 feet for the bridge using law of sine. For #100 we got 840.04 feet for the bridge. He often talks about rounding, and how important it would be if we had the real life job of making said bridge, because it could not be too short.

That is where this topic arose.If the answer is in fact that neither one is really “more accurate”, I am curious to know where my teacher got that idea from.

I am also curious if the slight difference in answer is based purely on the different formulas for each method.Thank you!

I answered:

Thanks! This is exactly what I was hoping for.

I agree with your calculations. For part 1, I get 840.725, and for part 2, I get 840.043.

Why are these different? The error is caused by … rounding!

As in the example at the end of the post I referred you to, lengths are easier to measure accurately, while

angles appear to have been rounded to the nearest degree, which introduces error — not in yourwork, but in thedatayou were given to work with.

Once again, the difference is in the inaccuracy of the data provided for the two calculations.

Here, I made a drawing (in GeoGebra), supposing that C is 840 feet from A, and using the distances given, as well as the 53° angle, to place B and D, in order to see what the angles at those points turn out to be:

In part 1 (givens in green), the angle at B rounds to the stated 79°; and in part 2 (givens in red), the angle at D rounds to the stated 68°. The latter is more accurate, so the answer obtained from it is more accurate (assuming my 840 is correct.)

I constructed B as an ambiguous SSA triangle; angle B could be either 78.7486° or 101.2514°, but I chose the one that agrees with the problem statement. These could be found by either the Law of Sines or the Law or Cosines (with equal accuracy!):

I constructed D as an SSS triangle; angle D is unambiguous.

Presumably the problem was designed this way, by rounding the angles.

So the difference is not caused by the

method, but by thedata. If there is a real-world lesson here, it may be

Trust distancesmore than angles, and try to use fewer angles if possible.Trust no measurementtoo much; be aware of the precision of each.- If you need to be sure your answer is not too small,
round your answer up.- Doing
two separate calculationsusing separate data to check is a good idea.

In the real world, we would have had to actually measure angles and distances; the accuracy of both depends on the instruments used, A protractor, as claimed in the problem, would only read the nearest degree. It is not clear how the distances would have been measured.

The benefit of redundant data in real life is demonstrated in the last example in Area of a Plot of Land.

Now, I was at a short talk a couple weeks ago about AI and education, one point of which was that ChatGPT is much better at

lookingcorrect than it is atbeingcorrect. My playing with it has confirmed that. (If your teacher would be willing to share what he found, I’d be interested.)

See below.

As to

what your teacher recalled, there may well have been some exercises like this one, or some particular cases (such as taking the inverse cosine of a number near 1) that led to that impression (probably aided by the fact that the second answer herelooksmore accurate!).I did the same search you did and found nothing really relevant … just as I expected.

Again, thanks for your help. This was as much fun as I hoped it would be.

After this, I asked ChatGPT, and got a reasonable answer:

Not bad. But clearly more can be said,

]]>

We’ll introduce the issues with a quick question from a year and a half ago:

Dear Doctors

So I’m currently doing counting principles in my math module and I’m so lost. I know that

with permutation order mattersandwith combinations order doesn’t matter. But even though I know this,I still can’t seem to understand the differenceeven after asking around and watching YouTube videos. It just doesn’t make sense to me. Could someone please explain it in a way that I’ll understand.Thanks.

The problem with “order matters” is that it is a very brief summary of some big ideas, and teachers sometimes seem to think that those words are enough to make everything clear.

I answered:

Hi, Anthony.

Let’s first see whether this post in our blog helps:

Permutations and Combinations: An Introduction

There, we answered several questions about this from (or for) students at about your level.

Then, if you have questions that doesn’t answer, or if you try some problems and still get confused, write back with those questions and we’ll see if we can deal with them.

You don’t really know whether you understand a concept until you use it, so actual examples will be better than talking about ideas generally.But if I had to give you a quick answer, I might say something like this (which I think is a little clearer than “order matters”:

Permutationsarearrangementsof items in a row. So we are counting the number of ways you canarrangesome or all of a set of objects in specific slots.Combinationsaresubsetsof a set of items. So we are counting the number of ways you canselectsome number of them. You might picture justtaggingeach one you want, rather than putting them somewhere. (When you put them on display, you might bepermutingthem!)

Anthony didn’t reply to let us know what his specific needs were; but we’ll be looking at examples as we proceed!

When we count permutations, we are considering ways to **choose and arrange** some or all of a set of items, putting them into a fixed set of places, such as a **shelf:**

When we count combinations, we are considering ways to merely **choose** items from a set, as if we are just tossing them in a **bag**:

For what follows, recall (from the page referred to, or elsewhere) that:

**Factorial**, written as \(n!\), means the product \((n)(n-1)(n-2)\dots(2)(1)\), and counts the number of permutations of an entire set of*n*items.**Permutations**, written as \(_n\text{P}_r\) or \(P(n,r)\), counts the number of**arrangements**of*r*items chosen from*n*items; in particular, \(_n\text{P}_n=n!\). This is often read as “permutations of*n*things taken*r*at a time,” agreeing with the order in the notation.**Combinations**, written as \(_n\text{C}_r\), or \(C(n,r)\), or \({n\choose r}\), counts the number of**subsets**of*r*items chosen from*n*items. This is often read as “combinations of*n*things taken*r*at a time.”

Now we come to one of the questions from Ivka, from mid-February:

Dear Dr. Math,

Can you help me with this question? I think I got the answer but I am not understanding one thing. In this word problem,

the order does not matter. So I first assumed it’scombinations. But then I was getting a low count for the number of ways the event can happen. And the answer choices for wrong answers were all listed, such as 1 way or 6 ways. But I kept thinking logically that more ways are possible.I opened my textbook and found a very similar problem that calculated the question with

permutations! Arranging 6 items on the shelf is just 6! or 6P6 or a fundamental counting principle can be used.So I calculated my problem the same as the textbook, and I got 720 ways or 6!.

The givens are:

Jessie has 6 trophies and wishes to arrange them in a single file line on the shelf. In how many different ways can 6 trophies be arranged?

I calculated

combinations of 6 things taken 6 at a timebut only got 1 way to arrange 6 trophies on the shelf, which of course, is illogical! Even though the answer was listed among the answer choices, I did not select it! Next, I calculatedcombinations of 6 things taken one at a timethinking I was arranging 6 trophies one at a time and got 6 ways. It did not sound right to me because there are many more ways to arrange trophies on a shelf. Then, I thought of thefundamental counting principle: the first trophy can be arranged in 6 ways, the second trophy in 5 ways, the third trophy in 4 ways, the third trophy in 3 ways, the second trophy in 2 ways, and the last trophy in 1 way. Then I looked in the book. I found this problem:In how many ways can seven books be arranged on the shelf?

The solution used the fundamental counting principle or 7P7 to find the solution: 1040 ways. This is the number of arrangements of 7 books taken all at a time.

Could you explain please

why in this case the order does NOT matter but permutations are used? It’s counterintuitive. I used to think that the order does not matter with combinations, which still is true. It’s interesting to find a way when order does not matter and permutations are used.Thank you so much for your help.

Ivka

There’s a lot of thinking there, and a mystery for us! Why is Ivka convinced that order doesn’t matter, when all the evidence points to permutations as the correct approach? We’ll have to ask questions to find out what sort of thinking is being done.

Doctor Rick answered:

Hi again, Ivka.

The problem is:

Jessie has 6 trophies and wishes to arrange them in a single file line on the shelf. In how many different ways can 6 trophies be arranged?

You say, “In this word problem,

the order does not matter.” Why do you say this? You said it several times, but in fact orderdoesmatter in this problem. That is indicated by the word “arrange“. Arranging books on a shelf meanschoosing the orderof the books on the shelf: Which book will be on the far left? Which book will be to the right of that book? And so on.

Here are some trophies, showing a couple of the ways to arrange them in different orders:

Regardless of what caused you to think that order does not matter in this problem,

we need to think a little more deeplythan a formulaic “if order matters, use permutations; if not, use combinations.” Sometimes both are needed, sometimes neither, and the attributes that determine how we handle a problem are not always obvious from the language of the problem. You might find helpful guidance in our blog post: Permutations and Combinations: An Introduction. I could point to other blog posts about more challenging combinatorics problems, but we probably should stick with the kinds of combinatorics problems you are seeing now.

Sometimes, as shown in other posts, we may even *repeatedly switch* between permutations and combinations, and other ideas!

Ivka wrote back, explaining her thinking:

Dear Dr. Rick,

Thank you for your thorough explanation. I appreciate your time to answer my lengthy question.

I learned in the past that we use permutations when order matters; for example, when people are racing for places and

it matters who wins the first, second, or third place. So order matters in this case. Likewise, when people are being selected for the positions of president, VP, and secretary. Order matters becausea person who qualifies for a VP might be a bad secretaryand will likely have bad customer service. VPs have an ego! Secretary doesn’t.

So, what does it really mean when we say, “order *matters*“? In what sense does it need to “matter”?

Doctor Rick responded:

What you’re describing, in the examples of placing in a race and of positions in a club or company, can also be described as “

distinguishable” places. It’s not necessarily that one place ismore desirablethan another, ormore suitedto a particular person, but simply that the different places can be told apart (distinguished). A club with John as President and Jennifer as Secretary isdifferentfrom a club with Jennifer as President and John as Secretary.

A permutation is an ordering of items, in which the order “matters” because you can tell one order apart from another, and you are asked to do so. It is not a question of whether someone cares!

If we were selecting, say, three runners to advance to the next heat, or two students to be co-leaders, then we would not distinguish between the positions, and would count using combination – even if the runners really prefer to be first, or John cares whether he got more votes.

In your problem, perhaps you are thinking that no one position on the shelf is

betterthan another position — but they can bedistinguished! If I tell you to get me the third trophy from the left, you can do that. If the trophies were arranged differently on the shelf, the third trophy from the left might be a different trophy. That’s why we need permutations here: we aren’t just interested in which trophies are on the shelf, but how they areorderedon the shelf.

One might ask *why* we care about the order of the trophies. It could be that we want them in chronological order, or because we want the colors to go well together, or because John wants his trophy to be prominent. But as far as the problem is concerned, we “care” only because that is what we were asked to do: count different ways to place them on the shelf.

This is the sort of thing that teachers often fail to see; we think we are communicating clearly. but students may hear something very different. That’s a tremendous benefit of a service like ours, where we get to see things from students’ perspective, and learn how we can be misunderstood.

A few days later, Ivka (clearly a very diligent and thoughtful student) wrote again with the opposite problem: choosing permutations when combinations were right:

Hello,

I missed a word problem because I solved it as a

permutationbut it is acombination.The givens are:

Jenny is packing for a weekend gateway. She has 8 dresses but can only fit 3 in her bag. How many different groups of 3 dresses can she bring with her?

I used a

permutationformula because thegroups of dresses must be different. In other words, each group of dresses must havethree different dresses. I am now listening to the solution video and it says that ‘dresses are placed in the same bag thus the order does not matter’. I agree with their idea but how about the givens in the problem asking for a group made up of 3 DIFFERENT DRESSES. Since the same dress cannot be counted twice in one group, I saidrepeats are not allowed. Thus, combinations are not possible.I should have thought about or at least contemplated briefly on combinations vs permutations before I started solving the problem. It seemed reasonable to me that since the dresses must be different in each group, we must have permutations here as

combinations give us repeats. In other words, we might end up packing the same dress twice; wow, or maybe that’s not possible to have a combination because we cannot have repeats of the same dress in a group but we still can have repeats of the dresses. For example, a black dress, pink dress, and white dress are thesame combinationas pink, black, and white dresses. That’s why I eliminated combinations as a possibility.Could you improve my thinking here, please? I was shocked I missed this problem! Thank you very much for your wonderful service.

By the way, the correct answer is 8C3=56 ways. My answer was 8P3=336 ways. Both answers were included among answer choices for a student to fall into a trap. I fell right in a big pothole and got hurt! I missed the answer!

Sincerely,

Ivka

There are several things going on here that need to be untangled. There are “different groups” and “different dresses”, and “different orders” that count as the “same combination”.

This time I answered, taking one bit at a time:

Hi, Ivka.

I’ve been taking my time, trying to understand your thinking, to see what needs to be corrected.

First, here’s how I might approach the problem:

She is choosing 3 dresses to put in the bag. It doesn’t matter where or when each dress goes in the bag, so all she is doing is

selecting a subset. That’s what acombinationis; apermutationisin a certain order, which is not what she is doing. So the number of ways the bag can be packed is 8C3.arranginga subset

A **bag** can be considered a model of a **set**, with no distinguished locations within it; that means we’re counting **combinations**. Sometimes it is not entirely clear whether we should pay attention to “where or when” (location within a list, or order in which things are added to it); part of this comes with seeing enough such problems to understand the language being used.

Now, let’s think about where you are going wrong.

You say,

I used a permutation formula because

the groups of dresses must be different.There is only one group (bagful) at a time; we are counting

different possible choicesfor filling that bag. The issue at this point is,What choices do we consider different?Here, it’s justwhichdresses are in the bag, not theirplacementwithin it (or theorderwe put them in). The main issue is notthatthe groups we count must be different, buthowthey are different. And the answer is,only in terms of content, not order.

That’s why this is a combination question: Order doesn’t matter; that is, **different orders are not distinguished**. Any counting problem means counting “different” possibilities; we have to think about which possibilities count as being different.

Here is a set of eight dresses, showing a choice of 3 of them:

But where the word “different” comes in the sentence is also important; “different groups” and “different dresses” are, well, *different*:

Then you say,

In other words, each group of dresses must havethree different dresses.This is

not the sameas the previous statement. The fact that the dresses in the bag must be different is merely becausethere is only one of each dress. In a different problem, there might be multiple dresses of the same design, which would be considered “the same”, so that she might pick more than one of “the same dress”; that would be what we callrepetition. But that is not true here — and it is not what distinguishes permutations from combinations.

Both permutations and combinations are about choosing **distinct items** from a set, and do not allow repetitions. (The term “permutations with repetition” is sometimes used, but is really a misnomer; this terminology is found in How Many Different Meals Are Possible?; typically, this is modeled as counting words that can be made with a set of letters, independently choosing a letter for each location rather than starting from a fixed set of objects.)

You say,

Since the same dress cannot be counted twice in one group, I said

repeats are not allowed. Thus, combinations are not possible.But that is not a difference between permutations and combinations.

Repetition is not allowed in either; combinations aresubsetsof distinct items, and permutations arearrangementsof distinct items. Both are about distinct items.

On one hand, there is only one of each dress; on the other hand, each group we count must contain a different set of dresses.

Again, you say,

combinations give us

repeats. In other words, we might end up packing thesame dress twiceCan you explain what you mean by this?

I think you are explaining it when you say,

maybe that’s not possible to have a combination because we cannot have repeats of the same dress in a group but

we still can have repeats of the dresses. For example, a black dress, pink dress, and white dress are thesame combinationas pink, black, and white dresses. That’s why I eliminated combinations as a possibility.Here you are just saying that order doesn’t matter, which is why it

isa combination; different permutations can contain the same items as one another, but that is not a repetitionwithin a set; it’s just duplicationbetweendifferent choices, which is irrelevant.

Different permutations are distinguished both by what items they contain, and by their order; so two different permutations may contain the same set of items. The same combination may be listed in different orders, but that doesn’t make them repetitious.

For more about what we mean by “distinct” or “distinguishable”, see Combinatorics: Multiple Methods, Subtle Wording.

I hesitate to send beginning students to these more advanced examples, which go beyond what they probably have seen; but it can be helpful just to get a glimpse of what more can be done.

]]>

My exploration of this idea began with this question from 2000:

Volume of a Frustum-Like Structure I have apyramid-like structurewith arectangular baseand rectangular top, i.e. the top of a rectangular pyramid has been removed. I have tried using three different methods to calculate the volume. I was told it's called a frustum. Which one do you suggest? Top: 73 by 37 Bottom: 46 by 10.5 Angle: 18 degrees Height: 4.6

Here is a scale model of this, but turned upside-down (large base on the bottom) the way we usually draw a frustum:

I’ve extended the red edges to show the problem: This is not a frustum of a **pyramid**, because the edges don’t meet in a point, the apex of a pyramid. The entire shape is more like a hipped roof.

It appears that the 18° angles are the approximate slant of the faces: \(\arctan(\frac{4.5}{(73-46)/2}=18.43^\circ\); \(\arctan(\frac{4.5}{(37-10.5)/2}=18.76^\circ\); so this is consistent with the other data; we don’t need this information separately, but it may have been used in one of the attempts to find the volume. If this were an actual rectangular pyramid, the sides would have different slopes.

I answered, starting with the formula we saw last time for a true frustum:

Hi, Alison. I don't know what three methods you have, but our Formula FAQ has a formula: http://mathforum.org/dr.math/faq/formulas/faq.pyramid.html V = h(B1 + B2 + sqrt[B1*B2])/3 Unfortunately, your shape is not really a frustum of a pyramid, becausethe top and bottom are not similar, so if you continued the edges up to the point of the presumed pyramid, they would not meet. So that formula does not work.

The volume of a true frustum is the average of the areas of the two bases, and their geometric mean \(\sqrt{B_1B_2}\).

We used the fact that any cross-section of a pyramid is similar to the base last time. Here, the smaller base is a narrower rectangle than the other, with a ratio of \(10.5/46=0.228\), compared to \(37/73=0.507\).

So, what formula is there for this shape?

I used calculus, though it probably wasn't entirely necessary, to find a formula for this sort of shape where two rectangles are joined by straight edges, without having to be a frustum, and I got this: a2 +-------+ b2/ / \--------- +-------+ \ | / | \ | /.........|.... + | h / | / | / | / | / | /b1 ------ / | / / |/ +---------------+ a1 V = [a1*b1 + a2*b2 + (a1*b2 + a2*b1)/2] * h/3 Here the bottom is a1 x b1, and the top is a2 x b2, with the a's parallel and the b's parallel. Notice that a1*b1 = B1 and a2*b2 = B2 in the frustum formula; but the third term "averages" the two in a different way, which works out to be the same if a1/b1 = a2/b2. Try both formulas for your shape, and you'll find they give different results.

We’ll see a different version of the formula below, which makes the third term more clearly an average. For now, here is our formula: $$V=\frac{h}{3}\left(a_1b_1+a_2b_2+\frac{a_1b_2+a_2b_1}{2}\right)$$

Note that it doesn’t matter which base we call 1 or 2; the formula is the same.

If you are supposed to be able to figure this out for yourself, I'll try to see how you could do it; otherwise, just use this formula, which I don't recall ever seeing.

The important thing is that the figure for which this formula applies is more general than the rectangular frustum, and therefore can be used for both frustums and these pseudo-frustums.

I’ll show the calculus derivation below; for now, I want to show a geometrical derivation like those I did last time:

We can dissect the figure into a rectangular prism (with base \(a_1\times b_1\) and height *h*), four quarters of a pyramid (with base \((a_2-a_1)\times(b_2-b_1)\) and height *h*), and halves of two triangular prisms (one with triangular base \((a_2-a_1)\times h\) and “height” \(b_1\), and another with triangular base \((b_2-b_1)\times h\) and “height” \(a_1\)):

The volume is:

$$V=V_{prism}+V_{pyramid}+V_{tri-prisms}\\=a_1b_1h+\frac{1}{3}(a_2-a_1)(b_2-b_1)h+\frac{1}{2}b_1(a_2-a_1)h+\frac{1}{2}a_1(b_2-b_1)h\\=\frac{h}{6}\left[{\color{Red}{6a_1b_1}}+2(a_2b_2{\color{Blue}{-a_2b_1}}{\color{Green}{-a_1b_2}}+{\color{Red}{a_1b_1}})+3({\color{Blue}{a_2b_1}}{\color{Red}{-a_1b_1}}+{\color{Green}{a_1b_2}}{\color{Red}{-a_1b_1}})\right]\\=\frac{h}{6}\left[{\color{Red}{2a_1b_1}}+{\color{Green}{a_1b_2}}+{\color{Blue}{a_2b_1}}+2a_2b_2\right]\\=\frac{h}{3}\left[a_1b_1+a_2b_2+\frac{a_1b_2+a_2b_1}{2}\right]$$

The work here was virtually identical to the dissection method we saw last time; and if the bases are proportional so that the figure is an actual rectangular frustum, we get the frustum formula:

If \(a_2=ka_1\) and \(b_2=kb_1\), then this formula becomes

$$V=\frac{h}{3}\left[a_1b_1+a_2b_2+\frac{a_1b_2+a_2b_1}{2}\right]\\=\frac{h}{3}\left[a_1(ka_1)+a_2(ka_2)+\frac{a_1(ka_2)+a_2(ka_1)}{2}\right]\\=\frac{kh}{3}\left[a_1^2+a_2^2+\frac{a_1a_2+a_2a_1}{2}\right]\\=\frac{kh}{3}\left[a_1^2+a_2^2+a_1a_2\right]$$

while the frustum formula gives

$$V=\frac{h}{3}(B_1+\sqrt{B_1B_2}+B_2)=\frac{h}{3}(a_1b_1+\sqrt{a_1b_1a_2b_2}+a_2b_2)\\=\frac{h}{3}(a_1(ka_1)+\sqrt{a_1(ka_1)a_2(ka_2)}+a_2(ka_2))\\=\frac{kh}{3}(a_1^2+\sqrt{a_1^2a_2^2}+a_2^2)=\frac{kh}{3}(a_1^2+a_1a_2+a_2^2)$$

They agree.

The next question came six months later:

Volume of a Trapezoidal Solid I have a volume that is75 ft long. The front of the figure is57 ft wide, 35 ft high, while the rear is72 ft wide, 12 ft high. I know how to figure the volume L*W*H and the area L*W, but how do I account for the slope of the ceiling and the opposite widths?

Here I’ve drawn it flipped onto the front, putting it in our usual orientation for a frustum:

Having done some more thinking this time, I answered again:

Hi, Greg. It sounds like your shape can be thought of as a 57x35 foot rectangle at the front, joined by planes to a 72x12 foot rectangle at the back, 75 feet away. Here is a formula for the volume of that shape, which I will draw as a frustum-like figure with rectangular top and bottom rather than front and back (since that is the form in which I have most often dealt with it): a2 +---------------+ b2/ / \--------- +---------------+ \ | / | \ | /.................|.... + | h / | / | / | / | / | /b1 ------ / | / / |/ +-----------------------+ a1 V = [a1*b1 + a2*b2 + (a1*b2 + a2*b1)/2] * h/3 Here the bottom is a1 x b1, and the top is a2 x b2, with the a's parallel and the b's parallel (this is important).

This is the formula we saw before.

Analternative version of the formula, using the average length and width, is: V = [a1*b1 + a2*b2 + 4((a1+a2)/2 * (b1+b2)/2)] * h/6 \___/ \___/ \___________________/ area of area of area of bottom top middle rectangle rectangle rectangle The "middle" rectangle has sides that are the average of the sides of the top and bottom rectangles: a2 +---------------+ b2/ / \--------- +---------------+ + | / | / \ | /.................|./.. + | h / |/ / | +-------------------+ / | / (a1+a2)/2 | /b1 ------ / | / / |/ +-----------------------+ a1

It’s easy to transform the original formula to this one, with a little creativity:

$$V=\frac{h}{3}\left(a_1b_1+a_2b_2+\frac{a_1b_2+a_2b_1}{2}\right)\\=\frac{h}{6}\left({\color{Red}{2a_1b_1}}+{\color{Green}{2a_2b_2}}+a_1b_2+a_2b_1\right)\\=\frac{h}{6}\left({\color{Red}{a_1b_1}}+{\color{Green}{a_2b_2}}+({\color{Red}{a_1b_1}}+a_1b_2+a_2b_1+{\color{Green}{a_2b_2}})\right)\\=\frac{h}{6}\left(a_1b_1+a_2b_2+(a_1+a_2)(b_1+b_2)\right)\\=\frac{h}{6}\left(a_1b_1+a_2b_2+4\frac{a_1+a_2}{2}\cdot\frac{b_1+b_2}{2}\right)\\=\frac{B_1+B_2+4M}{6}h$$

This is a *weighted* average of the three areas, times the height!

In your case, a1 = 57 b1 = 35 a2 = 72 b2 = 12 h = 75

$$B_1=57\cdot35=1995\\B_2=72\cdot12=864\\M=\frac{57+72}{2}\cdot\frac{35+12}{2}=1515.75$$

$$V=\frac{h}{6}\left(B_1+B_2+4M\right)=\frac{75}{6}\left(1995+864+4\cdot1515.75\right)=111,525$$

For comparison, the frustum formula, wrongly applied here, would give

$$V=\frac{h}{3}(B_1+\sqrt{B_1B_2}+B_2)\\=\frac{h}{3}(a_1b_1+\sqrt{a_1b_1a_2b_2}+a_2b_2)\\=\frac{75}{3}(57\cdot35+\sqrt{57\cdot35\cdot72\cdot12}+72\cdot12)\\=\frac{75}{3}(1995+\sqrt{1723680}+864)\approx104,297$$

In 2001, we got the following extension question, which was added to the page:

I have the same shape. Getting the volume is the easy part, but I need to knowif I were to fill this shape with a liquid, how many gallons would there beper inch of height? I understand how to do this with a rectangular volume, but I can't get the vol/height ratio for this type of geometric solid. Thanks a lot!

I considered adding a couple questions of this sort about conical frusta last time, but the work in those answers wasn’t complete, and I struggled to finish it. This time, we’ll do it.

I answered:

Hi, Jason. If by "gallons per inch of height" you just mean the volume of a full container divided by its height, just use the formula (with dimensions in inches) to get cubic inches, divide by 231 to get gallons, and divide that by the height. But that doesn't mean much. For any container other than a cylinder, the ratio will vary with depth, and I think you want to vary the depth. I'm going to assume that you mean you are partially filling the container, and want to know thevolume at different levels. That is, you want the volume as a function of height. To review the situation, here's the picture: a2 +---------------+ b2/ / \--------- +---------------+ \ | / | \ | /.................|.... + | h / | / | / | / | / | /b1 ------ / | / / |/ +-----------------------+ a1 V = [a1*b1 + a2*b2 + (a1*b2 + a2*b1)/2] * h/3

Again, this is the first version of the formula, which is a little simpler for our purposes here.

Let’s redraw it with the bottom being smaller as we expect here, and partially filled:

Now, the width in the "a" direction changes linearly from a1 to a2, making it a = a1 + (a2-a1)k = (1-k)a1 + ka2 at kh units up from the bottom (where k varies from 0 to 1); likewise, b = b1 + (b2-b1)k = (1-k)b1 + kb2 The volume of the solid with height kh, base a1 by b1 and top a by b is V(k) = [a1*b1 + a*b + a1*b/2 + a*b1/2] * kh/3 = [a1*b1 + ((1-k)a1 + ka2)((1-k)b1 + kb2) + a1((1-k)b1 + kb2))/2 + ((1-k)a1 + ka2)b1/2] * kh/3 = [a1*b1 + (1-k)^2 a1*b1 + (1-k)ka1*b2 + (1-k)ka2*b1 + k^2 a2*b2 + (1-k)a1*b1/2 + ka1*b2/2 + (1-k)a1*b1/2 + ka2*b1/2] * kh/3 = [(1 + (1-k) + (1-k)^2)a1*b1 + ((1-k)k + k/2)(a1*b2 + a2*b1) + k^2a2*b2] * kh/3 = [(3-3k+k^2)a1*b1 + k^2a2*b2 + (3k-2k^2)(a1*b2 + a2*b1)/2] * kh/3

This gives the volume when it is filled to a fraction *k* of its height.

If we use the actual liquid level L instead of the ratio k, we can replace k = L/h and get V(L) = [(3h^2-3Lh+L^2)a1*b1 + L^2a2*b2 + (3Lh-2L^2)(a1*b2 + a2*b1)/2] * L/(3h^2) This gives volume as a function of depth.

$$V(L)=\frac{(3h^2-3Lh+L^2)a_1b_1+L^2a_2b_2+(3Lh-2L^2)(a_1b_2+a_2b_1)}{6h^2}L$$

As a quick check, when it is full, with \(L=h\), this becomes $$V(h)=\left[(3h^2-3h^2+h^2)a_1b_1+h^2a_2b_2+(3h^2-2h^2)\frac{a_1b_2+a_2b_1}{2}\right]\frac{h}{3h^2}\\=\left[h^2a_1b_1+h^2a_2b_2+h^2\frac{a_1b_2+a_2b_1}{2}\right]\frac{h}{3h^2}\\=\left[a_1b_1+a_2b_2+\frac{a_1b_2+a_2b_1}{2}\right]\frac{h}{3}$$ as expected.

This could be used for an actual rectangular frustum, too, making it a good general-purpose formula,

Here are the questions I mentioned that I skipped last time, which are about conical frustums:

Medicine Cup Frustum Volume by Inch of a Cone-Shaped Tank

When adapted to a cone, changing the rectangles to circles, our formula becomes:

$$V(L)=\pi L\frac{(3h^2-3Lh+L^2)r_1^2+L^2r_2^2+2(3Lh-2L^2)r_1r_2}{6h^2}\\

=\pi L\frac{L^2(r_1^2-4r_1r_2+r_2^2)+3Lhr_1(2r_2-r_1)+3h^2r_1^2}{6h^2}$$

That should work for these problems.

A 2002 question finally asked about the calculus derivation of the original formula:

Frustum of a Pyramid with a Rectangular Base Back in 2000 you gave a solution you derived using calculus to someone wanting to know how to figure the volume of a rectangular based frustum of a pyramid see: Volume of a Frustum-Like Structure http://mathforum.org/dr.math/problems/cunningham.5.12.00.html V = a1b1 + a2b2 + (a1b2 + a2b1)/2 x h/3 I am an engineer with a water treatment agency and need to figure the amount ofwater per foot of elevationin our reservoirs that happen to have the same shape as previously described. To satisfy my curiosity, could you please send me a copy of your derivation? I have had up through differential equations in college and am a bit rusty, so lay it on me.

I answered with a reference to the last answer, some new discoveries I’d made, and a hint to the derivation:

Hi, Brad. You'll be interested in this later page, where I answered a question like yours about the volume contained by such a shape up to a given depth: Volume of a Trapezoidal Solid http://mathforum.org/dr.math/problems/greg.11.15.00.html It will also be of interest to you that the shape under discussion is a special case of a more general shape I wasn't aware of at the time, called theprismoid, or, even more generally, aprismatoid: MathWorld - Eric Weisstein: http://mathworld.wolfram.com/Prismoid.html http://mathworld.wolfram.com/Prismatoid.html The latter page gives the same formula as in my reference above, for the volume in this much more general case, where you just havetwo parallel polygonal basesjoined to one another bystraight edges: V = h/6 (A1 + 4M + A2) where h is the altitude, A1 and A2 are the areas of the bases, and M is the area of the cross section halfway between.

A prismatoid might, for example, have an irregular quadrilateral base and a triangular top, joined by straight line segments like this:

The polygon *M* will differ according to how vertices in the top and bottom are joined, so it plays an important role in the formula:

The fact that my formula (in this form) extends beyond the case I’d proved it for is not too surprising, but the extreme generality is amazing. And the formula can extend even further, to the general prismatoid.

Now we consider his actual question, deriving my original formula:

Now, mycalculus derivationfor our special case was based on the fact that the length and width are changing linearly with height. So the dimensions of the rectangular cross-section at height x will be a = a1 + (a2-a1) * x/h b = b1 + (b2-b1) * x/h (To check these, see what they are at x=0 and x=h.) Then I integrated the product ab with respect to x, from 0 to h. It's a pretty easy integration. I later rederived the same formula bydissectingthe shape into several prisms and pyramids. That takes a little more visualization and calculation, but no calculus.

We’ll carry this out below.

Brad succeeded with the hint, but needed another:

Thank you for the information. I was able to integrate it and come up with the same answer. My only other question is what principle or theorem you used for thea and b of the rectangular cross-sectionat any height x? a = a1 + (a2-a1) * x/h b = b1 + (b2-b1) * x/h Once again, thanks for your prompt response to my inquiry.

This was the easy part (sort of). I answered:

Hi, Brad. You can just draw a side view: a1 +------------+ ------------- / . \ |x | / . \ | | +------------+-----+ ------ |h / . \ | / a1 . a2-a1 \ | +------------+-----------+ ------- a2 You can see that the width at distance x from the top is a1 + (a2-a1)x/h, using similar triangles. (I've got this upside-down from the original labeling, but that doesn't affect the math.) Is that what you wanted?

Brad replied:

Aha, I see the light. I guessed you were probably using similar triangles but was having a hard time visualizing it for some reason. Thank you for your time and patience with a math-rusty engineer. You have been most helpful.

Now let’s do it. It turns out to look a lot like my dissection proof!

$$\int_0^h\left(a_1+\frac{a_2-a_1}{h}x\right)\left(b_1+\frac{b_2-b_1}{h}x\right)dx\\=\int_0^h\left[a_1b_1+\frac{a_1(b_2-b_1)}{h}x+\frac{b_1(a_2-a_1)}{h}x+\frac{(a_2-a_1)(b_2-b_1)}{h^2}x^2\right]dx\\=\left[a_1b_1x+\frac{a_1(b_2-b_1)}{h}\frac{x^2}{2}+\frac{b_1(a_2-a_1)}{h}\frac{x^2}{2}+\frac{(a_2-a_1)(b_2-b_1)}{h^2}\frac{x^3}{3}\right]_0^h\\=a_1b_1h+\frac{a_1(b_2-b_1)}{h}\frac{h^2}{2}+\frac{b_1(a_2-a_1)}{h}\frac{h^2}{2}+\frac{(a_2-a_1)(b_2-b_1)}{h^2}\frac{h^3}{3}\\=\frac{h}{6}\left[6a_1b_1+3a_1(b_2-b_1)+3b_1(a_2-a_1)+2(a_2-a_1)(b_2-b_1)\right]\\=\frac{h}{6}\left[{\color{Red}{6a_1b_1}}{\color{Green}{+3a_1b_2}}{\color{Red}{-3a_1b_1}}{\color{Blue}{+3a_2b_1}}{\color{Red}{-3a_1b_1}}+2a_2b_2{\color{Blue}{-2a_2b_1}}{\color{Green}{-2a_1b_2}}{\color{Red}{+2a_1b_1}}\right]\\=\frac{h}{6}\left[{\color{Red}{2a_1b_1}}{\color{Green}{+a_1b_2}}{\color{Blue}{+a_2b_1}}+2a_2b_2\right]dz\\=\frac{h}{3}\left[a_1b_1+a_2b_2+\frac{a_1b_2+a_2b_1}{2}\right]$$

Finally, in 2008, I got a chance to generalize things even more:

Volume of a Prismatoid How can I calculate the volume of a pyramid for which thebottom end is rectangularand thetop end is circular? I'm totally confused.

Here is an example of such a shape:

Do you see how this is essentially a prismatoid with a polygon at the top? A recent question about finding the area of such a shape was the inspiration for this series (though we were able to answer it with a rough approximation because the circle was very small).

I answered:

Hi, Ravi. This is not really a pyramid, but I picture it as a sheet-metal transition from a round pipe to a rectangular duct, which is formed by bending a series of triangles with one base in one end of the figure, and the vertex in the other. Is that right? The volume can be calculated using the formula for a prismatoid, which is the same kind of shape except that the top is a polygon rather than a circle: Prismatoid http://mathworld.wolfram.com/Prismatoid.html The formula is V = h/6 (A1 + 4M + A2) where A1 and A2 are the top and bottom areas, and M is the midsection area.

This time we’ll be using that general formula as our starting point.

In our case, if r = radius of circle L = length of rectangle W = width of rectangle it turns out that A1 = pi r^2 A2 = LW M = (L/2 + r)(W/2 + r) - r^2 + pi r^2/4 and so V = h/3 (pi r^2 + LW + Lr + Wr) That is such a nice formula, I imagine I am not the first to write it! See if this works for you. If you have any further questions, feel free to write back.

Here’s how I worked out the area *M*:

The midsection plane cuts each of the red edges halfway, so the horizontal and vertical edges are half of *L* and *W*. These are joined by quarter circles with radius \(\frac{r}{2}\), so the entire figure fits inside a rectangle with dimensions \(\frac{L}{2}+r\) by \(\frac{W}{2}+r\). To round off the corners, we can subtract four little squares each \(\frac{r}{2}\) by \(\frac{r}{2}\), whose total area is \(r^2\), and add on four quarter circles with radius \(\frac{r}{2}\), whose total area is \(\frac{\pi r^2}{4}\). Therefore, as shown, its area is $$M=\left(\frac{L}{2}+r\right)\left(\frac{W}{2}+r\right)-r^2+\frac{\pi r^2}{4}\\=\frac{LW}{4}+\frac{Lr}{2}+\frac{Wr}{2}+r^2-r^2+\frac{\pi r^2}{4}=\frac{LW+2Lr+2Wr+\pi r^2}{4}$$

Putting this into the prismatoid formula, $$V=\frac{h}{6}\left[LW+4\frac{LW+2Lr+2Wr+\pi r^2}{4}+\pi r^2\right]\\=\frac{h}{6}\left[LW+LW+2Lr+2Wr+\pi r^2+\pi r^2\right]\\=\frac{h}{6}\left[2LW+2Lr+2Wr+2\pi r^2\right]\\=\frac{h}{3}\left[LW+Lr+Wr+\pi r^2\right]$$

]]>

We’ll start with the frustum of a pyramid, with this question from 2001:

Volume of the Frustum of a Pyramid I am trying to figure outhow to derive the formulafor the volume of a frustum of a pyramid. As it states in your formulas: V = (h(B1+B2+sqrt[B1B2])/3 I understand everything except where you get the sqrt[B1B2] - what does that part represent in the frustum?

A frustum is a solid that has been cut off from an object by a plane parallel to the base. Here is an example of a frustum of a square-based pyramid, the first example we’ll be looking at below:

We’ve cut the pyramid ABCDE with a horizontal plane, resulting in the square FGHI parallel to ABCD.

I answered:

Hi, Kerry. There are two main ways to derive this formula:dissectionandsubtraction. Let's try both.

Dissection means “cutting apart”; we’ll show that we can cut the frustum into pieces whose volumes add up to the desired formula. Subtraction means “taking away”; we’ll subtract the volume of the top part of the pyramid from the entire pyramid to find the volume of the remaining part, which is the frustum.

I'll use the dissection method specifically for asquare frustum, from which you can apply the formula to other cross-sections usingCavalieri's theorem. Here is a top view of a square frustum: +---+----------+---+ | \ | d| / | +---+----------+---+ | | | d | | | | | | | s1| |s2 | | | | | | | | +---+----------+---+ | / | d| \ | +---+----------+---+

Here is the top view of my frustum above:

In my labeling, \(s_1\) is a side of the top (e.g. FG), \(s_2\) is a side of the bottom (e.g. AB), and \(\displaystyle d=\frac{s_2-s_1}{2}\) is the width of the extra space around the bottom. We’ll be calling the area of the top \(B_1\), and the area of the bottom \(B_2\).

Cavalieri’s theorem is discussed in Volume and Surface Area of a Sphere – Without Calculus. It says that if two figures have the same area in every cross-section (at any height from the base), then they have the same volume; so the formula we produce in terms of base areas will apply to *any* prism (and even to cones).

We are dissecting the frustum into these parts:

I have cut vertically through the sides of the top square, dividing the pyramid into nine parts: a central square prism with volume s1^2*h four corners that together form a pyramid of volume (2d)^2*h/3 four triangular prisms (on their sides), each with volume s1*d*h/2 The total volume is then V = [s1^2 + 1/3 (s2-s1)^2 + s1(s2-s1)]h = [s1^2 + 1/3 s2^2 - 2/3 s1*s2 + 1/3 s1^2 + s1*s2 - s1^2]h = [1/3 s1^2 + 1/3 s2^2 + 1/3 s1*s2]h = [s1^2 + s1*s2 + s2^2]h/3 Since B1 = s1^2 and B2 = s2^2, this is V = [B1 + sqrt(B1*B2) + B2]h/3

So the formula for the volume of a square frustum whose bases have **sides** \(s_1\) and \(s_2\) is $$V=\frac{h}{3}(s_1^2+s_1s_2+s_2^2),$$ and the formula for the volume of any frustum whose bases have **areas** \(B_1\) and \(B_2\) is $$V=\frac{h}{3}(B_1+\sqrt{B_1B_2}+B_2),$$ which Kerry asked for.

Now let's do it by finding the difference between the whole pyramid and the part cut off. This doesn't depend on the shape of the bases at all. Just look at thesimilar trianglesformed in a side view: + ----------------------- / \ ^ / \ | / \ |k / \ | / \ v +-----------------+ ----------- / a \ ^ / \ |h / \ v +-----------------------------+ --- b It doesn't matter what a and b actually are; since horizontal cross-sections are all similar,any linear measurement is proportional to the square root of the area, so we know b/a = sqrt(B2/B1)

Areas of similar figures are proportional to the square of any linear dimension (edge, height, radius), so $$\frac{B_2}{B_1}=\left(\frac{b}{a}\right)^2,$$ and $$\sqrt{\frac{B_2}{B_1}}=\frac{b}{a}.$$

We’ll be assuming that we know the height of the frustum, \(h\), and the two base areas, \(B_1\) on the top and \(B_2\) on the bottom. We don’t know the height of the entire pyramid.

First we have to find theheight of the original pyramid, using similar triangles: b/a = (h+k)/k = h/k + 1 so h/k = sqrt(B2/B1) - 1 1 k/h = --------------- sqrt(B2/B1) - 1 sqrt(B1) = ------------------- sqrt(B2) - sqrt(B1) sqrt(B1) (sqrt(B2) + sqrt(B1)) = ------------------------------ B2 - B1

The last step was multiplication by the conjugate: $$\frac{k}{h}=\frac{1}{\sqrt{\frac{B_2}{B_1}}-1}\\\\=\frac{1\cdot\sqrt{B_1}}{\left(\sqrt{\frac{B_2}{B_1}}-1\right)\cdot\sqrt{B_1}}\\\\=\frac{\sqrt{B_1}}{\sqrt{B_2}-\sqrt{B_1}}\\\\=\frac{\sqrt{B_1}\left(\sqrt{B_2}+\sqrt{B_1}\right)}{\left(\sqrt{B_2}-\sqrt{B_1}\right)\left(\sqrt{B_2}+\sqrt{B_1}\right)}\\\\=\frac{\sqrt{B_1}\left(\sqrt{B_2}+\sqrt{B_1}\right)}{B_2-B_1}$$

Now the volume of the frustum is the volume of the whole pyramid minus the volume of the top part: V = B2(h+k)/3 - B1 k/3 = B2 h/3 + (B2-B1)k/3 = B2 h/3 + (B2-B1)h/3 * k/h = B2 h/3 + sqrt(B1) (sqrt(B2) + sqrt(B1)) h/3 = [B2 + sqrt(B1B2) + B1]h/3

There, again, is the formula we wanted.

I included a reference to two derivations for the cone, which we’ll refer to below, and added:

The latter method is equivalent to that given in the Dr. Math archives for a frustum of acone(since a pyramid is just a special cone).

Is a pyramid really just a special cone? A cone, in its most general form, is formed by joining every point of a curve in a plane (the directrix) to a point called the apex; that is explained in the Ask Dr. Math FAQ:

If the directrix is a polygon, we call this a **pyramid**; if it is a circle, we call it a **circular cone**, which is the usual meaning of the word.

A 2002 question elicited a variation of the second derivation:

Volume of a Frustum of a Pyramid Hi Dr. Math. I don't knowhow to provethat the formula V = h/3 * (B1 + sqrt(B1*B2) + B2) is correct for a frustum of a pyramid.

I answered:

Hi, Zizza. Here is one way to derive the formula: If we reconstruct the entire pyramid, thetop part, with base area B2, issimilar to the whole pyramid, with base area B1. Their heights must therefore be in the ratio sqrt(B1):sqrt(B2). The height of the whole pyramid is therefore sqrt(B1)/(sqrt(B1)-sqrt(B2)) h, and its volume is Vwhole = 1/3 B1 * sqrt(B1)/(sqrt(B1) - sqrt(B2)) h while the volume of the removed top part is Vremoved = 1/3 B2 * sqrt(B2)/(sqrt(B1) - sqrt(B2)) h

(Here I used \(B_1\) for the bottom base, and \(B_2\) for the top, reversing the notation from before.)

The two heights are the distances from the apex of the complete pyramid to the two bases of the frustum, called \(h+k\) and \(k\) above; so $$\frac{h}{k}+1=\frac{\sqrt{B_1}}{\sqrt{B_2}}\\\frac{h}{k}=\frac{\sqrt{B_1}}{\sqrt{B_2}}-1=\frac{\sqrt{B_1}-\sqrt{B_2}}{\sqrt{B_2}}\\\frac{k}{h}=\frac{\sqrt{B_2}}{\sqrt{B_1}-\sqrt{B_2}}\\k=\left(\frac{\sqrt{B_2}}{\sqrt{B_1}-\sqrt{B_2}}\right)h,$$

and the total height is $$h+k=h+\left(\frac{\sqrt{B_2}}{\sqrt{B_1}-\sqrt{B_2}}\right)h=\\\left(1+\frac{\sqrt{B_2}}{\sqrt{B_1}-\sqrt{B_2}}\right)h=\left(\frac{\sqrt{B_1}}{\sqrt{B_1}-\sqrt{B_2}}\right)h.$$ These are multiplied by \(\frac{1}{3}\) times each base to get the volumes, $$V_{whole}=\frac{B_1}{3}\left(\frac{\sqrt{B_1}}{\sqrt{B_1}-\sqrt{B_2}}\right)h\\V_{removed}=\frac{B_2}{3}\left(\frac{\sqrt{B_1}}{\sqrt{B_1}-\sqrt{B_2}}\right)h.$$

Subtracting, we get Vfrustum = Vwhole - Vremoved = h/3 * [B1*sqrt(B1) - B2*sqrt(B2)]/[sqrt(B1) - sqrt(B2)] = h/3 * [sqrt(B1)^3 - sqrt(B2)^3]/[sqrt(B1) - sqrt(B2)]

So the frustum is $$V_{frustum}=\frac{h}{3}\left(\frac{\sqrt{B_1}^3-\sqrt{B_2}^3}{\sqrt{B_1}-\sqrt{B_2}}\right).$$

But thedifference of cubescan be factored: a^3 - b^3 = (a - b)(a^2 + ab + b^2) so we get Vfrustum = h/3 * (B1 + sqrt(B1*B2) + B2) which is just what we wanted.

$$V_{frustum}=\frac{h}{3}\left(B_1+\sqrt{B_1B_2}+B_2\right).$$

Had you noticed the similarity \(B_1+\sqrt{B_1B_2}+B_2\) to \(a^2+ab+b^2\)? This makes the formula quite memorable.

Now we’ll do some similar thinking for the cone.

Here is a 1999 question about both surface area and volume:

Volume and Surface Area of a Cone Frustum I have looked at your examples of the different types of cones, but I am unable to figure outhow you derived the formula for the volume and total surface areafor the frustum of aright circular cone.

Notice that our volume formula above applies to **any pyramid**, regardless of the shape of the base, or even whether the apex is above the center of the base (whatever that even means for a polygon). We could only find the **surface area** for such a wide variety of shapes by adding up the areas of the faces.

We’ll see that the same volume formula applies to a cone. But now we’ll be focusing on the **right circular cone**, with a circular base and with the apex above the center of the circle; this is specific enough that we’ll be able to find the surface area, too.

I answered, starting with the same two links I omitted above:

Hi, Chris. You haven't said where you saw a derivation of these formulas; I found these pages for the volume, and none for the area: Deriving the Volume of a Frustum http://mathforum.org/dr.math/problems/taylor5.6.98.html Derivation of the Formula for the Frustum http://mathforum.org/dr.math/problems/rizza08.09.99.html I'm going to assume that you are happy with the formulas for a cone, and only want to see how we can get from there to the formulas for a frustum. If you want more than I give you, feel free to write back.

I’ll be expanding and clarifying those brief explanations here.

First let's do thevolume. Rather than repeat what the other pages explain, I'll try a slightly different approach, still using similar triangles. Here's my picture: --------------------- +P -------------------- | /|\ | | / | \ | | / | \ S-s |H-h | / | \ | | / | \ | | /*****|**r**\ | | ** C+-----*+D-------------- H| S / *********** \ | | / | \ | | / | \ | | / | \ s | | / | \ |h | / | \ | | / | \ | | / **********|********** \ | | /***** | R *****\ | -----* A+----------------+B---- ****** ****** ********************* We know R, r, and h, but not H, the total height of the cone from which the frustum was cut. If we can find it, then the volume of the frustum will be the volume of the whole cone, pi R^2 H/3, minus the volume of the cone we cut off the top, pi r^2 (H-h)/3.

We’ll find \(H\), which corresponds to \(h+k\) in the work above, in much the same way we did before.

The triangles PAB and PCD are similar, so we can write the equation AB CD R r -- = -- or - = --- PA PC H H-h Cross-multiplying [that is, multiplying both sides by H(H-h)], we get R(H-h) = rH We can distribute the left side and collect H terms, then divide: RH - Rh = rH RH - rH = Rh (R-r)H = Rh Rh H = --- R-r

Now we can subtract the top of the cone (with radius \(r\) and height \(H-h\)) from the whole (with radius \(R\) and height \(H\).

Now let's write the volume formula and substitute this formula for H: pi pi V = -- R^2 H - -- r^2 (H-h) 3 3 pi = -- (R^2 H - r^2 H + r^2 h) 3 pi = -- [(R^2 - r^2) H + r^2 h] 3 pi Rh = -- [(R^2 - r^2) --- + r^2 h] 3 R-r pi R = -- [(R^2 - r^2) --- + r^2] h 3 R-r We can write R^2 - r^2 as (R - r)(R + r) and cancel: pi = --- [(R + r) R + r^2] h 3 pi = --- [R^2 + Rr + r^2] h 3 That's the formula.

Here we factored a difference of squares, rather than a different of cubes as before!

The result is the same as the general formula we got above for a pyramid: $$V_{frustum}=\frac{h}{3}\left(B_1+\sqrt{B_1B_2}+B_2\right)=\frac{h}{3}\left(\pi R^2+\sqrt{\pi R^2\pi r^2}+\pi r^2\right)=\frac{\pi h}{3}\left(R^2+Rr+r^2\right)$$

Now let's work on thelateral surface area. The formula for a complete cone is: A = pi R S where R is the radius and S is the slant height of the whole cone. For the frustum, we will subtract the area of the cut-off cone (whose slant height is S-s) from the whole: A = pi R S - pi r (S-s) = pi (RS - rS + rs) = pi ((R-r)S + rs)

We’ll see why I did that last step of factoring.

By the same similar triangles as before, we can write AB CD R r -- = -- or - = --- PB PD S S-s Again solving for S, R(S-s) = rS RS - Rs = rS RS - rS = Rs (R-r)S = Rs Rs S = --- R-r

This is essentially the same as our formula for *H*.

Now the area is Rs A = pi ((R-r)--- + rs) = pi (Rs + rs) = pi(R+r)s R-r and we're done.

This is very much like the formula for area of a trapezoid, \(A=\frac{B+b}{2}h\), which is the area of a rectangle whose width is the average of the two “widths”, *B* and *b*. In fact, we can write the formula as $$A=2\pi s\frac{R+r}{2},$$ the area of a cone whose radius is the average of the two radii of the frustum.

Next week: Shapes that *seem* like frustums, but aren’t. And after that: What if you cut a cone at an angle, rather than straight across?