This week, we’ll look at two recent questions about how parentheses (brackets) are used, how they relate to the properties we use in algebra that let us add or drop them, and the related concept of factoring a polynomial. They are examples of how student questions can touch on details teachers tend not to mention because they don’t think like a student. When we find out how a student thinks, we can correct misunderstandings that would never occur to us on our own!

The first question is from Teegan in early August:

Hi there,

I just had a general question about brackets in algebra for Gr. 10 Math, specifically polynomials.

My question can be phrased down to:

Is x(x+6) the same as (x)(x+6)?Thank you so so much!

Teegan

It’s a very simple question, but likely hides some further uncertainty. I answered:

Hi, Teegan.

You ask,

Is x(x+6) the same as (x)(x+6)?

The quick answer is,

yes. Both meanthe quantity x, multiplied by the quantity x+6.

Parentheses (brackets)simply mean that what they contain should be treated as a. So when they surround a single number or variable, they have no real effect. And when two quantities are written next to one another (single quantityjuxtaposed), that implies that they are to be, regardless of whether there are parentheses around either.multipliedIf there is some particular reason you asked, feel free to show us that context, and we may have more to say.

So there are really two concepts here: the parentheses that *are* there, and the multiplication symbol that is *not*! Students often get confused here, thinking the parentheses somehow *mean* multiplication, or that their absence changes the meaning.

Teegan had more questions, as I expected, in addition to providing helpful context:

Thank you so much, this is really helpful.

The reason I asked was sort of for a more general question about the unit I am currently doing rather than than a particular question, but to give some context; the unit is on

factoring trinomials. I sometimes get confused about whether toapply what comes before the Parentheses to everything in the Parentheses(when multiplying polynomials) or toFOIL the expression. (Or if that is the same?)Originally I applied what was before the Parentheses to the Parentheses when there were

no Parentheses on the first expression.e.g.

2xy(4x^2+6y)And then used FOIL when there were

two sets of Parentheses.e.g.

(2xy+7)(4x^2+6).But perhaps this changes when there is

more than one termin front of the Parentheses…I became a bit more confused when I saw a step in part of a solution in my practice questions that went from ((2x-1) – 2) to (2x-3). That original question was:

(2x-1)^2 – 5(2x-1) + 6

Let’s say u = 2x-1

u^2 – 5u + 6

[(2x-1) – 2][(2x-1) – 3]

(2x-3)(2x-4)

I guess I am just a little confused about

where Parentheses are multiplied together and where they can be dropped(like the above last step) and when there are no Parentheses – what to multiply – or if it is the same.Thanks so much,

Teegan

“FOIL” stands for “First, Outside, Inside, Last”, and is a mnemonic for how we distribute a product of two binomials. Teegan is exactly right in the supposition that the real difference in distributing is whether there are two terms (a sum) inside the parentheses, or a product. I find many students learn to distribute so well that they do it when it is inappropriate; in fact, a student I was tutoring just this week did it.

I replied, starting with the last question, about dropping parentheses:

Thanks. Often, general questions are best asked in terms of examples, so this helps me see how you’re thinking.

Basically, in order to

drop parentheses, you need to be applying one of the properties of operations, typically theproperties,associative(a+b)+c = a+(b+c) = a+b+c

(ab)c = a(bc) = abc

or the

propertydistributivea(b+c) = ab + ac

(a+b)c = ac + bc

I’ve written each of these a little differently than we usually teach them.

The difference in my presentation of the associative properties is the last part, where I dropped parentheses; and in the distributive property, it is that I showed it on both the left and the right.

(I used the plural for the *associative* properties, because there is one for addition and another for multiplication. I used a singular for the *distributive* property because it is one property, which applies to a combination of addition and multiplication; it is the same property when applied to either side, because of the *commutative* property.)

The

properties themselves just say you canassociativemovethe parentheses; but the order of operations tells us that, say, a+b+c means (a+b)+c because we do the left operation first, so the associative properties ultimately mean you canparenthesesdrop(all multiplications, or all additions).when the operations are the sameThis is what applies to your example of (2x – 1) – 2: subtraction is really addition of the negative, so this means (2x + -1) + -2, and we can apply the

associative property, moving the parentheses to make 2x + (-1 + -2), and then doing the addition: 2x + -3 = 2x – 3. With experience, you see subtraction as just a modified addition, and don’t need to write all this out.

So it is the order of operations that really permits dropping parentheses in \((a+b)+c\), because they say to do exactly what the order of operations already says to do for \(a+b+c\).

I wrote the

property with the multiplication on either side, which is appropriate because of the commutative property, which says the multiplication and addition behave the same way in either order. The “FOIL” method is really just an extension of this idea, applying the basic property repeatedly:distributive(a+b)(c+d) = a(c+d) + b(c+d) = ac + ad + bc + bd

This is best thought of simply as “

each times each“: to multiply two sums, we multiply each term of the first by each term of the second. The key idea is that this applies tosums of terms; and because of the order of operations, we need parentheses to write such an expression, grouping a sum together so we can then multiply it by something.

Some of the issues I’ve been discussing here are covered in

Order of Operations: Subtle Distinctions

Now I dealt with the first question, about parentheses around a product. I found this a very interesting question, because it feels so automatic to me to think of the “\(2xy\)” as a unit here that I wouldn’t even consider breaking it up; but in fact that would not be wrong:

Now,

, like yourif the first factor is a product(4x^2 + 6y), then you2xy, if you wanted, treat 2, x, and y separately, distributing them one at a time (y, then x, then 2); but that would be a waste of effort. What we would all do is to insert parentheses (at least mentally) seeing it ascould(4x^2 + 6y), which we can do because of the associative property (or just order of operations, really); then we just have one quantity to distribute:(2xy)(2xy)(4x^2 + 6y) = (2xy)(4x^2) + (2xy)(6y) = 2xy4x^2 + 2xy6y = 8x^3y + 12xy^2

Here, once I had only multiplications in each term, I dropped the parentheses (by the associative property) and combined factors.

There’s a lot going on here, and it’s very easy for a teacher (who knows all this too well) to go faster than you are ready for, not explaining why each step is “legal”. That’s why asking questions is good.

Breaking up the product would mean taking multiple steps:

$$2xy(4x^2+6y)=2x(4x^2y+6y^2)\\=2(4x^3y+6xy^2)=8x^3y+12xy^2$$

It is the associative property that allows this … if you had a reason to do it.

You say,

I sometimes get confused about whether to apply what comes before the Parentheses to everything in the Parentheses (when multiplying polynomials) or to FOIL the expression. (Or

?)if that is the sameYes,

these are really the same: the distributive property.The important difference in your first pair of examples is whether something in parentheses is a

productlike 2xy (in which case you treat it all as a single term, which it is), or asumlike (2xy+7) (in which case you multiply each term by each term in the other).

Let’s do that second one, carrying out the “FOIL” by multiplying each term by each term:

$$(2xy+7)(4x^2+6y)=(2xy)(4x^2)+(2xy)(6y)+(7)(4x^2)+(7)(6y)=8x^3y+12xy^2+28x^2+42y$$

You might think of parentheses as

a tool you can use at will. Just as you mighttie two objects togetherto make them easier to carry, you can put (or imagine) parentheses around a productwhen it is appropriate to treat them as a single unit(as directed by the properties and the order of operations); and you canuntie themwhen you need to handle them separately. So parentheses are a piece of rope. Just be careful not to try to put the rope where it doesn’t fit!The important thing is to preserve meaning. You can insert or remove parentheses when the operations inside and out are the same (

associative); but when they are different, you would need todistribute.

Teegan closed:

Thank you so much – this was really helpful.

Three days later, we got another question on a related topic:

Hi there!

I was factoring the question:

3x(14-4y) + 15x(7-2y)

and the answer I keep getting is

27x(7-2y)

But the correct answer is

21x(7-2y)

This is what I did:

3x(14-4y) + 15x(7-2y)

First,

I factored out the GCF, 3x(7-2y) and got:3x(7-2y)(2+2) + 5

3x(7-2y)(9)

27x(7-2y)

In the solution of the correct answer, before factoring out the GCF –

they simplified the polynomial, by dividing (14-4y) by 2. In order to make the quantities in the brackets match.I see how they got 21x(7-2y) but I am just confused about

why you cannot take the GCF out always– and what the difference is between simplifying first (which is technically still factoring out a factor – whether or not the greatest one)?Thank you so much!

Shannon

This is the sort of question we love to get, clearly showing her thinking, so we don’t have to guess what went wrong! The GCF is right, but something went wrong in factoring it out.

I answered, stepping through what Shannon had shown and asking for further clarification:

Hi, Shannon.

You just need to be a little more careful in your work; taking more and smaller steps will help.

Here is your work, taken more slowly:

3x(14-4y) + 15x(7-2y)

Factor out the GCF, 3x(7-2y),

:from each term3x(7-2y)

+ 3x(7-2y)*2*5Factor out the GCF

:from the entire expression3x(7-2y)(

)2+53x (7-2y)(7)

21x(7-2y)

You somehow doubled the 2.

Perhaps you can explain your thinking in writing 2+2.You also omitted parentheses in your work, at

3x(7-2y)(2+2)+5

If you expand this, you won’t be multiplying the 5 by anything, which surely is not what you intended. It appears, from the next line, that you meant

3x(7-2y)(2+2+5

)What I showed here is a good strategy for finding errors:

do it again, but more slowly!That’s the same thing I do to find something I’ve lost …

Shannon replied,

Thank you so much for the quick reply – this is really helpful!

I think I wrote 2+2 as a result of dividing 14-4y by 7-2y but it sounds like I am dividing this incorrectly – as if I just had 2 then I would get the correct answer.

I have been dividing it by going 14/7 -4/-2 y/y

And then getting 2 + 2

Thanks again,

It appears, as I look at it again now, that her division is like this: $$\frac{14-4y}{7-2y}=\frac{14}{7}+\frac{-4}{-2}\frac{y}{y}=2+2\cdot 1$$ I didn’t quite see this at the time; this represents a common error, in which students “cancel” a fraction by crossing out anything in the top and bottom that matches, forgetting that you can only cancel a **factor of the entire numerator and denominator**, not a factor of a **term**. Similarly, you can divide by a monomial by dividing each term in the numerator by the single term in the denominator, as in $$\frac{14-4y}{2}=\frac{14}{2}+\frac{-4y}{2}=7-2y,$$ but what Shannon did is wrong.

I responded with two better ways to have done this:

Your dividing is sort of like dividing 144 by 72 and getting 22.

It will definitely be better to think of this as a

factoringstep rather than adivision; or at least not doing the division in your head.As factoring, it’s 14 – 4y = ?(7 – 2y), and seeing that ? = 2 works.

As (long) division, it’s

2 ---------- -2y + 7 ) -4y + 14 -4y + 14 -------- 0Or, to put it differently, when you divide, you should always check by multiplying. If you think that (14 – 4y)/(7 – 2y) = 2+2 = 4, you would check whether 4(7 – 2y) = 14 – 4y.

So I think of factoring as simply looking for a common factor and pulling it out (once!) from every term. Another way to express that would be to first break each term into a product, and then “undistribute” that common factor:

$$14-4y=2\cdot7-2\cdot2y=2(7-2y)$$

The long division is a considerably longer process; at the time I imagined Shannon doing this (rather than the incorrect canceling) but writing the 2 above each column.

In the check, of course, we would find that \(4(7-2y)=28-8y\), which is not what we want, and the fix it.

Shannon closed:

]]>Thank you for your help!

A recent question asked about a well-known problem about stacking books (or cards, or dominoes) so that the top one extends beyond the base, giving a link to one of many explanations of it – but one, like many, that doesn’t quite fill in all the details. Doctor Rick responded with a link to an answer he gave 5 years ago. That’s what we’ll look at today.

There are many versions of this problem; here is the one Kalyan asked about in mid-August, which is reportedly from an edition of Stewart’s Calculus text:

Suppose you have a large supply of books, all the same size, and you stack them at the edge of a table, with each book extending farther beyond the edge of the table than the one beneath it. Show that it is possible to do this so that

the top book extends entirely beyond the table.In fact, show that the top book can extend

any distance at allbeyond the edge of the table if the stack is high enough.Use the following method of stacking: The top book extends half its length beyond the second book. The second book extends a quarter of its length beyond the third. The third extends one-sixth of its length beyond the fourth, and so on. (Try it yourself with a deck of cards.) Consider centers of mass.

In real life, you can just push each book (or block, or card) gently until it’s almost ready to tip over, starting from the top. Here’s what I got when I did that with some actual books:

It doesn’t really work as well as the theory!

It works a little better with dominoes, which are more rigid:

Our goal here is to prove that this can (in theory, though not in reality) be extended *infinitely far*.

The question we’ll look at, from 2017, was very similar to Kalyan’s question:

Card Stacking Problem, Redux I'm unable to follow what is happening here: Solution to "Stacking Dominoes" Specifically, this passage: For any n, the sum of the torques of the first n blocks has to be 0. Writing this out gives,without much difficulty, that if the i-th block is placed at x_i, thenHow did the author get the equation?How do we make that jump "without much difficulty"? If x_(n-1) is the last displacement, then why get all of that instead of just a sum of all the displacements? I understand the derivation immediately prior, a normal integration process; but I'm confused whether the author applies the same principle to take the next step. (When I tried to do that, I did not get the right answer.) Maybe the author is making up a whole new thing. But if I cannot grasp the next step conceptually, how can I grasp it mathematically? I find the indexing notation confusing, too. Wouldn't using n itself make more sense than i?

The text leading up to the quoted paragraph says,

Given

ndominoes stacked following this construction, we can take the whole stack and place it on top of one more brick without displacement, and it still won’t collapse. Shifting the whole stack ofnon top of the new one, at a certain point it will topple over. We want to compute the location of this point, which is exactlythe point at the bottom of the. For the first domino this is 1/2, for the second 1/4, and after that you need pen and paper to proceed.n^{th}domino around which the torque is 0The torque of a block that is horizontally placed between the

x-coordinatesxand_{0}xis independent of the vertical position and is proportional to $$\int_{x_0}^{x_0+1}(x-a)dx=x_0-a+\frac{1}{2}.$$_{0}+ 1

This explanation relies heavily on physics and some calculus; but Nikola’s question is not about this part (which we’ll restate more simply below), but about the recursive formula including a summation. That’s the part we need to dig into.

Doctor Rick replied, first looking for references he might provide:

Hi, Nikola, Thanks for writing to Ask Dr. Math. That explanation does look rather hard to follow. In fact, even if you get past the part you asked about, it goes on to say, "From this one can compute (this is not entirely straightforward, but not too difficult either), ..." The author punts, admitting that the next part isn't easy ... before skipping over it anyway! Maybe it will be more helpful if we finda better explanationof the problem and solution. Here is a page on a respected site that explains the problem nicely with pictures: Book Stacking Problem. From MathWorld--A Wolfram Web Resource But itskips over most of the solution processwith the words "it turns out that ..."

The reader is, of course, expected to figure that out! That’s typical of that site, which is not a tutorial. Can we find a fuller explanation?

The following document goes very deeply into the problem, discussing the physics in detail ... before generalizing the problem to such a degree that it getsVERY complex: Overhang, by Mike Paterson and Uri Zwick In the archives of our own service, we do have one related conversation, which falls in between these two levels of explanation: Card Stacking Problem Even this, however, alsoglosses over the part you're asking about. I am having a hard time locating anything that gives a clear, thorough, simple explanation ... ... so I guess I'll have to write up something myself!

The question is: "How far can a stack of n blocks protrude over the edge of a table without the stack falling over?" Rather than discussing torques, I will work with this idea: The stack of blocks (or books, or dominoes) will be stable if, for any number n of blocks counting from the top, thecenter of mass of that set of blocksis located over the block (or table) on which it rests. For maximum overhang, the center of mass will bedirectly over the *edge* of the next block down, or for the bottom block, the edge of the table. Each block is uniform and symmetrical, so its center of mass is at the center of the block.

Here is our ideal block, with X marking its center of mass (CM):

Let's do as your source does, and take the width of the block to be 1 unit; and refer to the illustrations on the MathWorld page.

Here is the MathWorld picture:

A single block can protrude beyond the table by a distance 1/2, as shown in the top figure. That's because its center of mass is a distance 1/2 from the right edge of the block.

Here is that one block on the edge of the table:

At this exact location, it would be teetering on the edge, ready to fall off if a breeze pushed it a little. And if the corner were a little rounded, it would be beyond its actual support and would fall. We’re assuming mathematically perfect blocks and table!

Now, we put another block under this one, with its edge where the table's edge was. Where is the center of mass of the combination of two blocks? The center of mass of an assemblage of objects is at theweighted average of the centers of massof the individual blocks. Let's work that out.

Here is how we find the center of mass of two objects:

The combined CM lies along the line joining the individual CMs, at a point inversely proportional to their masses, as we know from experience with see-saws:

Here $$\frac{d_1}{d_2}=\frac{m_2}{m_1}.$$ In other words, \(m_1d_1=m_2d_2\). In terms of physics, the torques must be equal and opposite.

Now, suppose the two objects are located at \(x_1\) and \(x_2\) along a number line, and the center of mass is at \(x_c\). Then $$m_1(x_c-x_1)=m_2(x_2-x_c)$$ Solving for \(x_c\), $$m_1x_c-m_1x_1=m_2x_2-m_2x_c$$ $$m_1x_c+m_2x_c=m_1x_1+m_2x_2$$ $$(m_1+m_2)x_c=m_1x_1+m_2x_2$$ $$x_c=\frac{m_1x_1+m_2x_2}{m_1+m_2}.$$ This is the weighted average of the two positions.

Now we apply this to two blocks:

Measuring from the right edge of the top block, thecenter of mass of the top block(which I'll callx[1]) is 1/2 unit from the right, as I said. The center of mass of the next block down is 1/2 unit to the left of its right edge, which in turn is x[1] = 1/2 unit to the left of the right edge of the top block. This is 1/2 + 1/2 = 1 (as you can see in the second figure).

Each block has the same mass, so their centers of mass are weighted equally in the weighted average. Thus, the center of mass of the top two blocks together is x[2] = (1/2 + 1)/2 = 3/4 Note that this is where the edge of block number 3 will be placed.

$$x_2=\frac{x_1+1}{2}=\frac{\frac{1}{2}+1}{2}=\frac{3}{4}$$

We place the edge of the third block under the center of mass of the top two combined, so it will just balance:

Now, I'll generalize, going to x[n + 1]. Suppose we have x[n], the center of mass of the top n blocks. Block n + 1 has its right edge at x[n], and its center of mass is 1/2 unit to the left of that; that is, at x[n] + 1/2. Now we take the weighted average of this one block and the n blocks above it: x[n + 1] = ((x[n] + 1/2) + n*x[n])/(n + 1) = (x[n](1 + n) + 1/2)/(n + 1) = x[n] + 1/(2(n + 1))

Here, for example, is how we find the center of mass of the first three blocks:

$$x_3=\frac{2\left(x_2\right)+1\left(x_2+\frac{1}{2}\right)}{2+1}=\frac{\left(3x_2+\frac{1}{2}\right)}{3}=\frac{\left(3\left(\frac{3}{4}\right)+\frac{1}{2}\right)}{3}=\frac{\;\frac{11}{4}\;}{3

}=\frac{11}{12}$$

The added distance is \(\frac{1}{6}\).

That's a recursive definition for x[n]. Working out successive values of x[n], we find x[1] = 1/2 (starting point of the recursion) x[2] = 1/2 + 1/(2*2) x[3] = 1/2 + 1/4 + 1/(2*3) x[4] = 1/2 + 1/4 + 1/6 + 1/(2*4) ... and so on.

The successive increases (that is, the overhangs of each block) are \(\frac{1}{2}\), \(\frac{1}{4}\), \(\frac{1}{6}\), \(\frac{1}{8}\), and so on.

Thus, we see that $$x_n=\sum_{k=1}^n\frac{1}{2k}$$

Notice that we can factor 1/2 out of all the terms, leaving: x[n] = (1/2)(1 + 1/2 + 1/3 + 1/4 + ... + 1/n) The sum in the last set of parentheses is known as theharmonic series. You can look up more information about this series; the key point is that itdiverges. That is, if you choose a great enough value for n, you can make the finite sum (nth partial sum of the infinite harmonic series) growas large as you wish. It takes a LOT of terms, even to reach 3, but it can be done. ... *in principle*. It's worth noting that everything said here requires ideal conditions. In practice, the blocks won't be perfect; a slight rounding of the corners would be sufficient to make it impossible to reach an overhang of 3.

Here I have simulated 35 blocks using a bar chart; you can see that the offset passes 1 after 4 blocks, and surpasses 2 after 31 blocks. To reach 3, you need 227 (not shown)!

]]>Extraneous roots can not only confuse the final solution to a problem; they can also make it harder to solve in the first place if you don’t deal with them early. Here is a relatively complicated rational equation, two questions about its solutions, and several ways to make it easier to solve. We’ll solve it half a dozen ways!

The question came from Akanksha in mid-August:

See this image:

The equation given in the question (initial equation) and the equation got after simplifying the initial equation which is the final equation (underlined as cubic equation) are one and the same. But if I calculate the value of x using the initial equation then I get x=(25-√17)/4; x=(25+√17)/4. On the other hand, I get three different values of x i.e. x=6; x=(25-√17)/4 and x=(25+√17)/4 for the final equation.

Question: why I got 1 extra value of x in final equation despite the two equations being exactly same because they should have same no. of values of x??And if I plug in the value of x=6 (got in final equation) in the initial equation then I get the value of equation as undefined!!?? Please explain.

My second question is that the equation given in the question is alinear equationbecause the highest degree of x is 1 but then alsoI have 2 values of x?? how??Please clarify my two questions.

Thank you

Akanksha followed a very routine procedure, adding the fractions on each side (using common denominators in a method she called “cross-multiplication”), simplifying the equation (by a similar method also commonly called cross-multiplying), and ending up with a cubic equation to solve. She doesn’t say how she solved that, or how she solved the original equation (it turns out she used technology rather than completing the work herself); her initial question is not about how to solve, but where the extra solution came from. She also has a misunderstanding of the meaning of “linear”. We’ll answer those questions and more.

Doctor Rick started with questions about the work:

Hi, Akanksha, thanks for writing to the Math Doctors.

I have some answers for you, but I also have some questions:

(1) You say you “calculate the value of x using the initial equation”.

did you do that? I assume you didn’t get the answerHowfrom the original equation; you manipulated it somehow, just not thedirectlyas the method you showed, right?same way(I was able to solve it by a method of my own, which began by

substituting y = x – 6just to make the numbers smaller — but that’s not necessary. My method did give just the two correct solutions.)(2) Your solution just skips from the cubic equation to the solutions.

did you solve the cubic? (I have an idea there too, but you may have done something different.)How

The substitution is not something I had thought of; let’s try it:

Letting \(y=x-6\), we replace \(x\) with \(y+6\), changing $$\frac{x-3}{x-6}+\frac{x-8}{x-7}=\frac{x-7}{x-6}+\frac{x-6}{x-5}$$ to $$\frac{y+3}{y}+\frac{y-2}{y-1}=\frac{y-1}{y}+\frac{y}{y+1}.$$

Adding fractions on each side, as before, we get $$\frac{(y+3)(y-1)+(y-2)y}{(y-1)y}=\frac{(y-1)(y+1)+y^2}{y(y+1)}$$ $$\frac{2y^2-3}{y^2-y}=\frac{2y^2-1}{y^2+y}$$

Cross-multiplying as Akanksha did (though that isn’t what I would normally do), we get $$(2y^2-3)(y^2+y)=(2y^2-1)(y^2-y)$$ $$2y^4+2y^3-3y^2-3y=2y^4-2y^3-y^2+y$$ $$4y^3-2y^2-4y=0$$

One benefit of the substitution, perhaps intended, is that because one factor became simply *y*, we can easily see that we can divide by *y* (which is legal because \(y=0\) can’t be a solution); in fact, we can divide by \(2y\): $$2y^2-y-2=0$$ This can be solved by the quadratic formula, yielding $$y=\frac{1\pm\sqrt{(-1)^2-4(2)(-2)}}{2(2)}=\frac{1\pm\sqrt{17}}{4}$$ To find *x*, we add 6 to this and get $$x=\frac{1\pm\sqrt{17}}{4}+6=\frac{25\pm\sqrt{17}}{4}$$ Akanksha had the right answer. And we have a preview of ways to eliminate that extra solution.

(Alternatively, we could have factored out \(2y\), obtaining \(y=0\) (i.e. \(x=6\)) as an extraneous solution; Doctor Rick evidently didn’t do that, since he got only the valid solutions.

We’ll see ways to solve the cubic later.

Now for some answers!

(1) Why did you get an extra solution when you solved the equation the way you showed?

We call the 6 you got an

. One common way that extraneous roots appear is when, in the course of the solution, weextraneous root.multiply by a number that could be zeroLook at the denominators in the last equation you got before “cross-multiplying”: each has (x – 6) as a factor. (That’s easy to see by looking up two lines to before you expanded the denominators.) Now, when you “cross-multiply”, what you are really doing is multiplying each side of the equation by the product of the two denominators. If x = 6, then you are

multiplying by zero! That is an invalid step in solving equations. Whatever the values of the numerators, the equation you got after the multiplication is 0*(numerator1) = 0*(numerator2), or 0 = 0, if x = 6. Thus x = 6 is a solution of the new equation.But looking at the

equation, when we set x = 6, two of the fractions have 0 in the denominator! That’s not allowed; we can’t divide by zero. So when x = 6, the original equation becomes, not 0 = 0, but (no number) = (no number). The (original) equation isoriginaltrue when x = 6, so 6 is not one of its solutions.not

This is closely related to what happened when we did the substitution above: We had *y* in the denominators on both sides, and after cross-multiplying, we had *y* on both sides as a factor. If \(y=0\) (that is, \(x=6\)), then that equation, but not the original, would be true. What I did above was to divide both sides by *y*, eliminating this problem! We’ll see below that we could have just avoided multiplying by this *y* (or \(x-6\)) in the first place.

When you claim that the original equation is “exactly the same” as the cubic equation, do you mean that they are

? When we manipulate an equation step by step, generally each step produces an equation equivalent to the previous equation — meaning that it has exactly the same roots. That’s the whole idea of solving an equation: we want to end up with an equation that is equivalent to the original, but whose solution is obvious — like x = 3.equivalentBut there are certain things we might do that do

produce an equivalent equation; multiplying by a quantity that might be 0 is one of those things. Take a look at our blog on this topic:not

That page is very much worth reading at this point. Also, for alternative ways to solve this sort of equation, see

Many Ways to Solve a Proportion

(2) Why does the original equation have two solutions if it is a linear equation?

The answer is that the equation is

a linear equation. You seem to have too loose a definition of a linear equation. A linear equation is an equation betweennot(apolynomials, each of the form axsum of terms^{n }— a constant times a power of the variable), in which the greatest power of x is 1. Your equation is not between polynomials, because there are variables in the denominator, which a polynomial cannot have.

What it is, is a **rational** equation whose numerators and denominators, in the original form, are linear; the equation as a whole is equivalent to a cubic equation; that better explains its behavior. (We’ll see that a quadratic equation works even better.)

Doctor Fenton had a different suggestion for simplifying the problem:

Another thing you can do with terms like these is to write

x-3 (x-6+6) – 3 (x-6)+(6-3) (x-6) + 3 x-6 3 3 --- = ----------- = ----------- = --------- = --- + --- = 1 + --- x-6 x-6 x-6 x-6 x-6 x-6 x-6If you do this for each term and simplify, you get a much simpler equation

3 6 1 1 --- - --- = --- - --- x-6 x-2 x-6 x-5 .

Essentially what he has done is to carry out each division, producing “mixed numbers” where all the fractions are proper. We’ve reduced the degree of the numerators. Let’s continue the work:

$$\frac{x-3}{x-6}+\frac{x-8}{x-7}=\frac{x-7}{x-6}+\frac{x-6}{x-5}$$

$$\left(1+\frac{3}{x-6}\right)+\left(1-\frac{1}{x-7}\right)=\left(1-\frac{1}{x-6}\right)+\left(1-\frac{1}{x-5}\right)$$

$$2+\frac{3}{x-6}-\frac{1}{x-7}=2-\frac{1}{x-6}-\frac{1}{x-5}$$

$$\frac{3}{x-6}-\frac{1}{x-7}=-\frac{1}{x-6}-\frac{1}{x-5}$$

(It looks like Doctor Fenton misread a 7 as a 2, and also made a little sign error.)

There are many routes from here; for now I’ll use something like Akanksha’s approach and combine fractions first, starting with the two with the same denominator:

$$\frac{3}{x-6}+\frac{1}{x-6}=\frac{1}{x-7}-\frac{1}{x-5}$$

$$\frac{4}{x-6}=\frac{1}{x-7}-\frac{1}{x-5}$$

$$\frac{4}{x-6}=\frac{(x-5)-(x-7)}{(x-7)(x-5)}$$

$$\frac{4}{x-6}=\frac{2}{(x-7)(x-5)}$$

$$4(x-7)(x-5)=2(x-6)$$

$$2(x^2-12x+35)=1(x-6)$$

$$2x^2-25x+76=0$$

$$x=\frac{25\pm\sqrt{25^2-4(2)(76)}}{2(2)}=\frac{25\pm\sqrt{17}}{4}$$

as we saw before. There was no extraneous solution this time, at least in part because I combined the fractions with the same denominator.

Akanksha replied, first acknowledging that “solve” meant using technology:

In reply To Rick Peterson Sir

1) I used calculator on Google (

equation solver calculator) to find out the value of x. I input the first equation as the original one and got2 values of x. Next, I entered the final equation in the input field and got3 values of x.2) I used the equation solver calculator to calculate x in cubic equation.

I am satisfied with your answer that x=6 is an extraneous solution.

In case I don’t want to use any other method (like factorise),can I use cross multiplication?? becauseI don’t knowthat the equation by which I am multiplying hasx as 0and that is an invalid step.

Can you help me in concludingwhether the original equation = final equation??

Did the steps I use are correct to find out the value of x asx=(25-√17)/4 and x=(25+√17)/4?? (x=6 is extraneous solution)

There is just a small misunderstanding of the validity of cross-multiplication to be clarified.

Doctor Rick responded:

Thanks, now I see that you didn’t really solve either equation yourself; you let technology do it for you. I wondered about that.

Clearly you’ve learned some techniques for solving rational equations, but this one led you to a difficult equation, a cubic.

There are ways to solve the cubic “by hand”, as I mentioned; theRational Root Theoremprovides a way to list every rational number that could possibly be a root of a polynomial, and then there is a nice way (“synthetic division”) totest each possible rational root. Unfortunately, your cubic equation isn’t very nice; it has something like 32 possible rational roots!So it’s good to know a few other techniques or tricks, and try different things. We have suggested a few.

The Rational Root Theorem tells us that if our cubic equation, which we can first simplify, by dividing by 2, to $$2x^3-37x^2+226x-456=0,$$ has any rational roots, then (in lowest terms) the numerator must be a factor of 456 (namely, 1, 2, 3, 4, 6, 8, 12, 19, 24, 38, 57, 76, 114, 152, 228, or 456), and the denominator must be a factor of 2 (namely 1 or 2). Counting signs, this gives the following possible roots: $$1, 2, 3, 4, 6, 8, 12, 19, 24, 38, \\57, 76, 114, 152, 228, 456, \\\frac{1}{2}, \frac{3}{2}, \frac{19}{2}, \frac{57}{2}, \frac{1}{4}, \frac{3}{4}, \frac{19}{4}, \frac{57}{4}$$ and their negatives (48 in all).

We can then divide the polynomial by each of these in turn, and when we get to the divisor \((x-6)\), we find that the quotient is \(2x^2-25x+76\), and we’ve already found its solutions. Throwing away the extraneous root 6, we have the answer.

And we could in fact have known to divide first by \((x-6)\), by noting that it was a factor of both denominators. It would be even better if we had just eliminated it in the first place!

But on to the questions you asked:

Can you help me in concluding whether the original equation = final equation?You already know that your original and final equations are not equivalent (that’s the proper term, not “equal” or “the same equation”), because

they do not have identical solution sets. But probably what you are asking is how to tell whether they are equivalent!without solving bothAs the blog I showed you says, we need to be able to recognize what sort of steps might produce an extraneous root (or, on the other hand, miss a root). The first “risky step” listed in the blog is the one you did:

, which you do in solvingmultiplying by an expression containing the variable, and which might result in unintentionally multiplying by zero;rational equationsThen you asked,

In case I don’t want to use any other method (like factorise), can I use cross multiplication?? because I don’t know that the equation by which I am multiplying has

x as 0and that is an invalid step.The problem with your solution was multiplying by a quantity that

be zero. (couldIt isn’t that x is zero, but that there is some value of x for which themultiplieris zero. That value of x turns out to be 6.) We don’t need tothis will happen, just to realize that itknowhappen. As the quote above says, just multiplying by an expression that contains the variable raises this possibility!might

So the main issue with extraneous roots is not to **avoid** making them (though we sometimes can), but to **recognize** when they may exist, so we can check them. (Other methods we’ve shown can avoid it in this case, but that is not always possible.)

You ask, what if you want to use this risky method? Sometimes we really

another method! And sometimes the risky method is the quickest, so wecan’t findchooseto do it anyway. There is nothing wrong with this. However, when we’re done, we need toin the original equation to see whether they really make the equation true. That’s the main point of the blog. We aren’t done until we do that.check all solutionsIn your problem, as I already pointed out, if you plug x = 6 into the original equation, you’ll get two fractions with 0 in the denominator, and that means the solution is extraneous. The other two solutions will work fine.

But in one sense, you don’t even need to recognize the possibility:

Now, what if you

don’t noticethat what you are doing could introduce an extraneous root (or lose a true root)? The fact is, we should. That’s a good practice all the time, because extraneous or missing roots are not the only way we can go wrong! We might have made some careless error, like getting a sign wrong, or even multiplying 2 by 3 and getting 5. It happens! We all make mistakes, so we all need to check.check all the solutions anyway

As I tell students, sometimes you have to check because it’s really part of the work; other times you should check merely because you’re human. In either case, check! Especially when it counts.

Did the steps I use are correct to find out the value of xasx=(25-√17)/4 and x=(25+√17)/4?? (x=6 is extraneous solution)Well, they are correct as far as they go,

that they produce an extraneous solution! Therefore your workexceptiuntil you do the checks I just mentioned.sn’t completeAnd your solution is

, because there are ways to solve the problem without getting a cubic equation. But we can’t really know that we’re not going down the best path until we get in trouble. Finding a better way takes experience, so I won’t fault you for not finding one on the first try.not the best

So, how can we solve it without getting the cubic and its three solutions? We’ve seen a couple; I saw another and jumped into the discussion to show it:

If I may add yet another suggestion, here is what I would have done (because I would not have thought of Doctor Fenton’s nice idea!): When I got to

I would see that (x-6) is in

both denominators, and that 6 is not a valid solution, so I would immediately multiply both sides by (x-6), leavingAfter cross-multiplying and simplifying, this leaves a quadratic (whose solution is easy, and correct).

Here is what happens if we cross-multiply now:

$$\frac{(x-3)(x-7)+(x-6)(x-8)}{x-7}=\frac{(x-7)(x-5)+(x-6)^2}{x-5}$$

$$\frac{(x^2-10x+21)+(x^2-14x+48)}{x-7}=\frac{(x^2-12x+35)+(x^2-12x+36)}{x-5}$$

$$\frac{2x^2-24x+69}{x-7}=\frac{2x^2-24x+71)}{x-5}$$

$$(2x^2-24x+69)(x-5)=(2x^2-24x+71)(x-7)$$

$$2x^3-34x^2+189x-345=2x^3-38x^2+239x-497$$

$$4x^2-50x+152=0$$

$$2x^2-25x+76=0$$

$$x=\frac{25\pm\sqrt{17}}{4}$$

We never saw a cubic, or an extraneous root.

Another similar way is to use the trick I included in my version of Doctor Fenton’s approach: We can see that two of the denominators are the same, and combine those fractions at the start. Here is a version of that approach:

$$\frac{x-3}{x-6}+\frac{x-8}{x-7}=\frac{x-7}{x-6}+\frac{x-6}{x-5}$$

Subtract one fraction from each side:

$$\frac{x-3}{x-6}-\frac{x-7}{x-6}=\frac{x-6}{x-5}-\frac{x-8}{x-7}$$

Combine each pair of fractions, using a common denominator (which is easy on the left this time):

$$\frac{(x-3)-(x-7)}{x-6}=\frac{(x-6)(x-7)-(x-8)(x-5)}{(x-5)(x-7)}$$

$$\frac{4}{x-6}=\frac{(x^2-13x+42)-(x^2-13x+40)}{(x-5)(x-7)}$$

$$\frac{4}{x-6}=\frac{2}{(x-5)(x-7)}$$

This simplified far more than we were expecting, due to the particular numbers! Now we can divide both sides by 2 to simplify the equation, and then cross-multiply:

$$\frac{2}{x-6}=\frac{1}{(x-5)(x-7)}$$

$$2(x-5)(x-7)=1(x-6)$$

$$2x^2-24x+70=x-6$$

$$2x^2-25x+76=0$$

As before, we get two solutions from the quadratic equation, and nothing extraneous.

In general, looking for such simplifications may both

prevent extraneous solutions, andmake the work easier.Another way, incidentally, is rather than cross-multiplying, to multiply both sides by the

LCD. This will have similar results.The extraneous solution you got is a common effect of cross-multiplication, and therefore a reason to prefer the LCD.

Here is the work using the LCD:

$$\frac{x-3}{x-6}+\frac{x-8}{x-7}=\frac{x-7}{x-6}+\frac{x-6}{x-5}$$

The LCD is \((x-6)(x-7)(x-5)\); we don’t need two of the same factor \((x-6)\). (That’s what’s going to help!) Multiplying every term by this cancels each denominator and leaves

$$(x-3)(x-7)(x-5)+(x-8)(x-6)(x-5)=(x-7)(x-7)(x-5)+(x-6)(x-6)(x-7)$$

$$(x^2-10x+21)(x-5)+(x^2-14x+48)(x-5)=(x^2-14x+49)(x-5)+(x^2-12x+36)(x-7)$$

$$(x^3-15x^2+71x-105)+(x^3-19x^2+118x-240)=(x^3-19x^2+119x-245)+(x^3-19x^2+120x-252)$$

$$2x^3-34x^2+189x-345=2x^3-38x^2+239x-497$$

$$4x^2-50x+152=0$$

$$x=\frac{25\pm\sqrt{17}}{4}$$

We’ve seen several tricks that are specific to this problem, and other more generic ideas that may help with many rational equations. Have fun applying them!

]]>Some problems can be done either by algebra or by basic arithmetic methods and some creativity; and although algebra generally makes work easier by making it routine, sometimes special-purpose thinking (once you have thought it!) can be quicker. Here we have a problem where a creative method didn’t quite make sense. Can we make sense of it? Can we fix it?

Here is Kalyan’s question, from early August:

Hello Doctor:

The question:

10 years agofrom the present age,ratio of John’s age to his father was 1:5.

After 6 yearsfrom the present age,ratio becomes 3:7.Find the present age of son and father.

Everyone knows to solve this problem using x and y and using

simultaneous equationsto find the x and y. But, how to solve the same problems usingratios, specially for Olympiads or any competitive exam to save time.I saw a video on YouTube (it was not in English, so cannot post it here). But, I can tell you the trick.

First he took

John : Father 10 years ago 1 : 5 6 years hence 3 : 7Then, he said the

differencebetween 1 and 3 in the ratio is 2 units, and thedifferencebetween 10 ago years and 6 years hence is 6-(-10) = 16 years.So, 2 units = 16 years

Therefore, 1 unit = 8 years.

So, therefore, John’s age 10 years ago, 8 * 1 = 8 years.

So, after 10 years John’s age is 8 + 10 = 18 years.

Correspondingly, I can find father’s age.

My question:

Whyis 3-1 unit difference has been taken, because 3-1 is not the actual age difference?Howthis process was derived from the traditional x and y method, could you tell me something regarding this?

He’s right to question that! As we’ll see, taking the difference between terms in two different ratios is not really meaningful!

We’ll be looking at an algebraic solution; but an intuitive approach may or may not parallel that.

I answered:

Hi, Kalyan.

I’ll first show you

how I worked it out myselfwithout looking atwhat you showed; then I’ll see if there’s anything to say about the latter; and then I’ll try to relate it to thealgebra, if I can.Here’s the problem (slightly reworded):

10 years before the present time, the ratio of John’s age to his father’s was 1:5.

6 years after the present time, the ratio becomes 3:7.

Find the present age of son and father.

I first tried my own usual approach to such problems, knowing from experience a useful fact about ages of people at different times.

My method:The key will be that

the difference in their ages is a constant.10 years ago, the ratio 1:5 tells us that that

difference was 4 times John’s age, since it’s 5-1 = 4 parts to John’s 1.In 6 years, 16 years later, the ratio 3:7 tells us that the difference will be 7-3 = 4 parts to John’s 3 parts, making it

4/3 of John’s age then.Inverting both facts: 10 years ago, John’s age was

1/4 of the (constant) difference; 16 years later it will be3/4 of that same difference.So

adding 16 to his earlier age multiplies his age by 3(from 1/4 to 3/4); that is, the increase in age is 2 times the original age. So his age then must be 1/2 of 16, namely 8.Their ages then were

8 and 40(1:5 ratio, the difference being 32). Now, 10 years later, they are18 and 50; and in 6 more years they will be24 and 56(ratio 3:7).

By finding a constant to compare both ages to, we have a way to turn ratios into absolute numbers.

Did the unknown YouTuber use a similar approach? Hopefully it will be easier to analyze now that we have a simple solution to compare it to. We know that his *answer*, at least, is correct (18 years old).

Now I’ll read the

method you’re asking about, and paraphrase it here:The data are:

John : Father 10 years ago 1 : 5 6 years hence 3 : 7The

in the ratio is 2 units.difference between 1 and 3The

is 6-(-10) = 16 years.difference between 10 ago years ago and 6 years from nowSo,

.2 units = 16 yearsTherefore, 1 unit = 8 years.

John’s age 10 years ago, 8 * 1 = 8 years.

After 10 years John’s age is 8 + 10 =

18 years.I see that I did the same 3 – 1 = 2, but I did it near the end, and that came from 3/4 – 1/4, which would not have been justified if the two denominators had not both been 4 (5-1 and 7-3).

He makes no mention (at least in your retelling) of the important fact that

5-1 and 7-3 are both 4(or, equivalently, thatthe difference 7-5 is also 2). I don’t think this would work without that. For example, if the first ratio had been written as 2:10, then subtracting 3-2 = 1 and 7-10 = -3 would not give the same answer!In general,

the “units” or “parts” used in a ratio are arbitrary, and can’t be compared between different ratios. In my work, I used such “parts”, but in one ratio at a time.Did he mention any of these other ideas, but you omitted them?

It is important to realize that the concept of “parts”, as in “1 part to 5 parts”, is local, referring only to one ratio; the “parts” will in general be different in other ratios. So it appears that the YouTuber either was lucky that the units were the same in both ratios, in some sense, or recognized that and failed to communicate this essential part of his thinking.

Later we’ll do a different example and see that the method as stated really does fail when the conditions of the problem change.

Kalyan had asked whether a simple method (such as the YouTuber’s) could be derived from the algebraic work, which he implied he found easy. That’s possible, so it was worth trying. First, I had to lay out a version of the algebraic solution for discussion:

Now let’s solve it by

algebra, and see if any of what he or I did is reflected in that.Let John’s age now be J, and his father’s F. Then the two statements become

(J-10):(F-10) = 1:5 [Ratio 10 years ago]

(J+6):(F+6) = 3:7 [Ratio 6 years from now]

Cross-multiplying,

5(J – 10) = F – 10

7(J + 6) = 3(F + 6)

Simplifying,

5J – F = 40

7J – 3F = -24

Multiplying the first equation by -3 and adding,

-8J = -144

J = 18

Solving for F,

5(18 – 10) = F – 10, F = 50

I see no 2 or 4 anywhere in here. Possibly we could see more similarity in a different approach.

That gave the correct answer, which is encouraging; but we never subtracted \(3-1\) or \(6-(-10)\). It is likely that those are hidden inside the work.

One way to find such a hidden operation is to do the same work with variables, so that every operation is visible.

So now I want to make a

general formula, and see if that has anything in common with the other methods. Let’s change the problem to this:

m years before the present time, the ratio of John’s age to his father’s wasa:b.

n years after the present time, the ratio becomesc:d.Find the present age of son and father

in terms of a, b, c, d.My algebraic solution:

Let John’s age now be J, and his father’s F. Then the two statements become

(J-m):(F-m) = a:b

(J+n):(F+n) = c:d

Cross-multiplying,

b(J – m) = a(F – m)

d(J + n) = c(F + n)

Simplifying,

bJ – aF = m(b – a)

dJ – cF = n(c – d)

Multiplying the first equation by c and the second by -a,

bcJ – acF = cm(b – a)

-adJ + acF = an(d – c)

and adding,

(bc – ad)J = cm(b – a) + an(d – c)

J = [cm(b – a) + an(d – c)] / [bc – ad]= [bcm – acm + adn – acn] / [bc – ad]Applying this with m=10, n=6, a=1, b=5, c=3, d=7, we get

J = [3*10(5 – 1) + 6*1(7 – 3)] / [5*3 – 1*7] = (120 + 24)/(15 – 7) = 144/8 = 18

Notice the presence of both

b-aandd-c, which both happen to be the same for our problem.

That shows that \(5-1=4\) and \(7-3=4\) are both used; but the \(3-1=2\) on which the solution in question relied is \(c-a\). That still isn’t visible in the algebra.

Our formulas for both unknowns are $$J=\frac{cm(b-a)+an(d-c)}{bc-ad}\\F=\frac{dm(b-a)+bn(d-c)}{bc-ad}$$

What if it were given that \(c-a=d-b\), as in our specific problem? Then also \(d-c=b-a\), and we could factor that out from our formula: $$J=\frac{(cm+an)(b-a)}{bc-ad}.$$ We can also replace \(d=b+c-a\) and get a formula without \(d\): $$J=\frac{(cm+an)(b-a)}{bc-ab-ac+a^2}=\frac{(cm+an)(b-a)}{(b-a)(c-a)}=\frac{cm+an}{c-a}.$$

We’ll be seeing that again …

What I did

without algebra, replacing numbers with variables, looks like this:m years ago, the ratio a:b tells us that the difference was (b-a)/a times John’s age, since it’s b-a parts to John’s a.

In n years, m+n years later, the ratio c:d tells us that the difference will be d-c parts to John’s c parts, making it (d-c)/c times John’s age then.

Reversing both facts, m years ago, John’s age was a/(b-a) of the (constant) difference; m+n years later it will be c/(d-c) times that same difference.

So adding m+n to his earlier age multiplies his age by [c/(d-c)]/[a/(b-a)] = [c(b-a)]/[a(d-c)]; that is, the m+n years is [c(b-a)]/[a(d-c)]-1 = [c(b-a)-a(d-c)]/[a(d-c)]= [bc-ad]/[a(d-c)] times his age then. So his age then must be [a(d-c)]/[bc-ad] times m+n, namely [a(d-c)(m+n)]/[bc-ad].

Now, m years later, it is [a(d-c)(m+n)]/[bc-ad] + m = [a(d-c)(m+n)+m(bc-ad)]/[bc-ad] = [adm – acm + adn – acn + bcm – adm] / [bc – ad] =

[bcm – acm + adn – acn] / [bc – ad].And that’s the

.same formula

With variables, the algebra was a lot easier to follow than the “intuitive method”, wasn’t it? But at least we see that my intuitive method agrees fully with the algebra. Does the other?

What he didamounts toc – a units = m+n years.

So 1 unit = (m+n)/(c – a) years.

So John’s age m years ago was a(m+n)/(c – a), and

J = a(m+n)/(c – a) + m =

.[an + cm]/[c – a]Applying this with a=1, b=5, c=3, d=7, we get

J = [1*6 + 3*10]/[3 – 1] = 36/2 = 18

That is

the same formula, since b and d are absent; he is wrong in general, but justnothere. I hope in a different problem he would do something different to account for b and d!happens to get the correct answerI don’t mind looking through a non-English video to see what I can extract from it; please provide the link.

The formula I got here agrees with what I found above given that \(c-a=d-b\).

Might the YouTuber have actually made an easily overlooked comment explaining that he was using the fact that \(c-a=d-b\)? We never heard more on that.

There is another way to think about problems like this that involve ratios, and would commonly be solved by algebra. It uses what are variously called “tape diagrams” or “bar models”, and is commonly associated with what is called “Singapore math”. I had images similar to these in my mind when I devised my intuitive method, and on second thought, this seemed like a better way to explain it. So I wrote again:

Thinking about this again, it occurred to me that since I like thinking of ratio problems in visual terms, I should try that here.

Here is the ratio 10 years ago (1:5, reading bottom to top):

+---+---+---+---+---+ | | Father +---+---+---+---+---+ | | John +---+Here it is 6 years from now (3:7):

+---+---+---+---+---+---+---+ | | Father +---+---+---+---+---+---+---+ | | John +---+---+---+Since the latter is obtained by adding 16 years to both ages, I can line them up:

........+---+---+---+---+---+ ........| | Father........ ...16...+---+---+---+---+---+ ........| | John ........+---+ +---+---+---+---+---+---+---+ | | Father +---+---+---+---+---+---+---+ | | John +---+---+---+Clearly 16 years equals 2 parts, so each part is 8 years, and the initial ages are 8 and 40.

The idea here is to make bars representing the two quantities, showing the “parts” used in each ratio without knowing their actual size, and relate them to one another. For this problem, little extra thinking was needed because the “parts” in both ratios represented the same number of years (the difference in the ages being 4 parts in both cases), so that we could just add the 16 years and see that both ages increased by two “parts” from one time to the other. But that won’t always happen:

And this is exactly what your video maker did …

But

.I could do that only because the difference of terms in each ratio was 4 unitsAgain, if he said something about the differences both being 4, then his work is valid.

One might well solve the problem this way and never make note of the “happy coincidence” that made it easy. With less helpful numbers, we’d be forced to pay attention:

Now, what if we had a problem where the differences in the ratios are

the same? Here is one:notToday, the ratio of John’s age to his father’s is

. In 6 years, it will be1:6. How old are they?1:3I can make the same sort of bars, but they don’t work:

........+---+---+---+---+---+---+ ........| | Father ...6....+---+---+---+---+---+---+ ........| | John ........+---+ +---+---+---+ | | Father +---+---+---+ | | John +---+

The bars don’t have the same scale in each ratio, as we see from the fact that the difference of ages is not the same; so we can’t just add years. We need to rescale the diagrams so that the difference is the same size in each; for that purpose, we can do the same sort of thing we do to add, or compare, fractions using a common denominator (though here it is not a denominator).

To make them line up, I need to make

equivalent ratiossuch that thedifferences, rather than being 5 and 2,will both be 10(the LCM). So I multiply by 2 and 5, respectively, rewriting 1:6 as 2:12, and 1:3 as 5:15. The new bars do line up correctly:............+---+---+---+---+---+---+---+---+---+---+---+---+ ............| | Father ......6.....+---+---+---+---+---+---+---+---+---+---+---+---+ ............| | John ............+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | Father +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | John +---+---+---+---+---+Now 6 years corresponds to 3 parts; each part is 2 years, so

John was 4 and his father was 24. (In 6 years, they will be 10 and 30, with the required ratio.)This visual method helps to make sense of everything, without needing the structure that algebra provides.

We can, of course, solve this problem with algebra, too. Here’s the work equivalent to what I did before:

Let John’s age now be *J*, and his father’s *F*. Then the two statements become

\(J:F = 1:6\) [Ratio now]

\((J+6):(F+6) = 1:3\) [Ratio 6 years from now]

Cross-multiplying,

\(6J=F\)

\(3(J+6)=F+6\)

Simplifying the second,

\(3J+18=F+6\)

Substituting from the first equation into the second,

\(3J+18=6J+6\)

\(12 = 3J\)

\(J = 4\)

Solving for *F*, \(F=6(4) = 24\).

This time it was probably less work with algebra, in part because I gave data for “now” as one of the equations. But each method can be good, as long as we pay attention to the reason we can take each step.

]]>We often see polynomials in a simplistic way, imagining that any function whose graph *resembles* a polynomial *is* a polynomial. Much as an attempt to mimic random data often lacks essential properties of genuine randomness, so what we intend to be a polynomial often is not. As we observe more polynomials, we can start to recognize more of the constraints to their shapes; but even teachers can sometimes design problems that are not quite what they seem. Here I will analyze a graph to death, pointing out what it is not, before answering the question as it was presumably intended.

Here is the question, sent in by Randy in the last day of July:

How can I write

a Polynomial that can represent the graph below?

This looks very much like a polynomial; yet I could recognize from experience that it was probably hand-drawn in some sense, not made by actually graphing a polynomial. It just looks a little too tame. I decided to take this as an opportunity to “play”, experimenting with this open-ended question and taking it further than was surely intended, to check my intuition.

In particular, I observed that the phrase “*can* represent” is ambiguous. How accurate must the representation be? What if there is no polynomial that actually *does* represent the graph?

I answered:

Hi, Randy.

That depends on exactly what you hope to do! I hope you’ll tell me the context of your question, whether it’s an assignment or just a matter of curiosity. Meanwhile, I’m going to have fun with it.

Since no coordinates are shown, you may just mean, how could you write a polynomial whose graph would look

like that, with two local maxima and two local minima. It is easy to see thegenerallyminimal degreefor such a polynomial.

Since the graph changes direction 4 times, and can be seen as 5 segments going up-down-up-down-up, its degree would have to be *at least* 5. We could make something like this by writing 5 factors corresponding to 5 zeroes more or less evenly spaced, scaling it so it fits, and perhaps shifting it up or down a little. Here, for example, is the graph of $$f(x)=\frac{1}{10}(x+4)(x+3)(x+1)(x-1)(x-2)\\=\frac{x^5+5x^4-3x^3-29x^2+2x+24}{10}$$

Shifting this down about 1.6 would make it look a lot like our goal.

Here are the five segments I imagine when I see it, which suggest degree 5:

Not all polynomial graphs are quite so clear-cut; this is more typical: $$g(x)=\frac{1}{30}(x+4)(x+2)(x^2-2x+3)(x-2)\\=\frac{x^5+2x^4-9x^3+4x^2+20x-48}{30}$$

There the segments are more subtle, because the change of direction between them is not always from up to down; the slope just alternately decreases and increases:

Anywhere, here’s what we get by shifting my function down:

But probably a little more is expected:

Or, you might want to estimate the

and write a polynomial using the appropriate factors. If you do that, though, you’ll find that a polynomial of minimal degree that has these intercepts doesn’t really match the graph. Here is what I get using GeoGebra:x- and y-intercepts

Here I used the given graph as a background image in GeoGebra, adjusting its size so that the zeros were approximate integers (-5 with multiplicity 2; -1; 1; and 6), and then graphed a polynomial with those intercepts, namely $$f(x)=\frac{1}{75}(x+5)^2(x+1)(x-1)(x-6)\\=\frac{x^5+4x^4-36x^3-154x^2+35x+150}{75}$$ I chose to divide by 75 in order to adjust the vertical scale so that the *y*-intercept would be 2, rather than 150.

The relative minimum near \(x=4\) is far too low.

If I adjust this (by dividing by a constant) to make it fit in the grid, I get this:

Many textbooks give the

false impression that any polynomial-like curve is just what it looks like. But our general impression tends to be wrong.

That is the graph of $$f(x)=\frac{1}{300}(x+5)^2(x+1)(x-1)(x-6)=\frac{x^5+4x^4-36x^3-154x^2+35x+150}{300}$$

So, why is this so far from fitting the heights of the relative extremes? The problem is that a fifth-degree polynomial only has six parameters to adjust (either the coefficients in expanded form, or the zeros and the constant multiplier in factored form). Once we’ve chosen the intercepts, we have no more freedom to adjust individual heights; we have to take what we get. To put it another way, adjusting horizontal features (the *x*-intercepts) entails vertical features (slopes and extremes) in ways we can’t control.

Could we use a higher degree, with more parameters, and get closer? I’m sure that isn’t what was intended, but …

Just for fun, I tried

multiplying this polynomial by a quadraticand manually adjusting the coefficients (using sliders in GeoGebra) to try to get a better fit. Here is the best I could do:That has degree 7.

Its equation is $$f(x)=\frac{1}{300}(x+5)^2(x+1)(x-1)(x-6)(0.1x^2-1.1x+3.9)\\=\frac{x^7 – 7 x^6 – 41 x^5 + 398 x^4 + 325 x^3 – 6241 x^2 – 285 x + 5850}{3000}$$ Because the quadratic I multiplied by has no real roots, this didn’t change the intercepts, but did warp the overall shape in subtle ways (similar to what we saw in my second quintic graph above, but even more subtle). Yet it couldn’t reduce the slope of the outer parts, where the real polynomial shoots up and down far more steeply than we expect.

To do better, we could identify a few more points on the graph and plug them into, say, a general degree-11 polynomial and solve for the coefficients; but that’s actually worse (here using GeoGebra’s FitPoly function and the points in green):

If you try to squeeze a polynomial to fit the shape you want, it pops out in other places!

As an alternative, I entered ten of those points into a spreadsheet (because the highest degree it could handle is 9) and made a polynomial trendline; here is what it gave, a little better than GeoGebra’s:

$$\frac{0.3004x^9-1.04921x^8-13.0611x^7+11.9431x^6+179.65x^5+1125.12x^4-1307.43x^3-21135.9x^2+1140.54x+20000}{10000}$$

Other attempts using a different set of points made the match far worse.

If I really wanted to get it right, I’d use the

derivativeat the turning points, as well as the function values; but I’d have to do that manually, and since I’m doing this “just for fun”, I’m not going to try just yet!The point I’m really making is that any squiggle you make that looks like a polynomial to you, is probably a whole lot more complicated than you realize!

When you tell me what the actual goal is, we can discuss more details or what you might actually be expected to do.

All of this is to say that when *I* write a problem like this, I am careful to provide the graph of the actual polynomial I want as an answer, even if I don’t demand that the answer be exact. I don’t want students to do the right work, but then waste time struggling to decide whether the differences they find between the graph of their function, and the graph I provided, are enough to make their answer wrong. I don’t want to ask what function “could” have this graph, when in fact no polynomial actually *does*.

But we need to move on to the intended meaning of an imperfect problem. I had intentionally not given the equations I was graphing, so Randy would have work to do, even after I’d demonstrated the possibilities.

Randy replied the next morning with an attempt, showing he understood the main ideas, at least:

After hours of thinking, I got

f(x) = 1/20 (x + 4)^2 (x+0.01) (x-2) (x-5).Is this a valid answer?

The only thing that is really questionable about this is the unexplained number 0.01.

I answered,

If all you want is the

same general shape(as in my initial comment), then yes. Here is the graph, which looks quite good in that respect:

It does have the right number of ups and downs; but it has zeroes at -4 (with multiplicity 2), -0.01, 2, and 5, rather than a peak at 0; and the *y*-intercept is 0.08. (I rescaled the image to match his choice of -4 and 5 for the first and last zeros.)

You (or your teacher?) have to define what you consider to be a valid answer, since as I showed, you aren’t going to get an exact match.

Does it have to have a peak at x=0, for example?Since there was no scale, you don’t have to use the same numbers I show for intercepts, but I would expect not to have an intercept so close to 0. For my functions,

I used x-intercepts -5, -1, 1, and 6, having scaled the graph to put all the intercepts approximately at integers, but otherwise probably did the same sort of thinking you did.Was this an assigned exercise? How was it worded, exactly? What topics were most recently discussed? Are you allowed/expected to use a grapher like my GeoGebra, or Desmos?

Since his main error seemed to be the use of 0.01 instead of -1, I stated the intercepts I used; the most important piece of the exercise is in using those, not choosing them.

I was hoping that something in the context would give us a clearer idea of what constitutes a satisfactory answer. I expected the zeros to be the main concern, but if the context were calculus and there had been numbers on the axes, more might be demanded.

Randy replied with the entire problem:

That’s the question I got.

The zeroes he wrote in are similar to mine, and not what he used for his equation.

But we never learned what part (b) is, which could be interesting.

I answered:

The word “

could” is probably intended to acknowledge that your answer only needs to match the basic visible features, without needing to exactly match, which is good.The numbers you added to it for intercepts more or less agree with mine; use those, and you should be good.

The only things about your answer of f(x) = 1/20 (x + 4)^2 (x+0.01) (x-2) (x-5) that don’t fit are the specific numbers 0.01, 2, and 5 (only the first of these being definitely wrong), and the value of f(0).

So use your numbers -4, -1, 1, 6 instead of -4, -0.01, 2, 5; and then redo the calculation of the multiplier to obtain the y-intercept.

Using his intercepts, and setting the vertical scale to agree with the relative minimum, we get the equation $$f(x)=\frac{1}{240}(x+4)^2(x+1)(x-1)(x-6)$$ which looks like this:

That was probably acceptable, given the vague statement of the problem.

The next day, we got a question from Charli that raises a similar issue. This discussion didn’t go very far (in part because this is a graded assignment, so we only gave general suggestions, and also because she never read our last reply), but the problem was to write a polynomial for a given curve:

This doesn’t actually say that our polynomial should match the picture, but it’s implied. What we are told to do is to write a polynomial given specified intercepts. The answer to (a) is $$f(x)=(x+9)(x+3)(x-6)^2$$ This takes the *x*-intercepts into account, but not the *y*-intercept. (A monic polynomial is one whose leading coefficient is 1.) Since for that function \(f(0) =972\), in order to make \(f(0)=-2\) we need to change the sign and divide by 486: $$f(x)=-\frac{1}{486}(x+9)(x+3)(x-6)^2$$ Here is the graph of that function, superimposed on the picture:

That mostly fits the lower part, but totally misses the upper part – if for no other reason, just because the intercept at \((-9,0)\) is totally wrong. If we estimate that the leftmost intercept should have been about -16 rather than -9, we get $$f(x)=-\frac{1}{864}(x+16)(x+3)(x-6)^2$$ which looks like this:

Not quite, though it does fit the lower part better! Here is perhaps the best we can do: $$f(x)=-\frac{1}{594}(x+11)(x+3)(x-6)^2$$

So although the problem as stated is easy, the picture is not really quite right. In general, not every roller coaster you want to make can be described as a polynomial!

Just as the next part of the problem presumably asks for the **piecewise function** that includes the straight segment, a real model would use yet a third piece for the top part. And we would choose that function by matching the *slopes*, not just *location*, at the point where they meet. And that would make this a calculus problem.

Limits of indeterminate forms like ∞ – ∞ require us first to recognize the form, and then, often, use L’Hôpital’s rule (also called L’Hospital’s rule, as we’ll be seeing it here), or some other method. Today’s question will touch on all stages of this work for three examples, but focus on the beginning.

Here is Anwar’s question, from mid-July:

Hello doctor.

Can I get the

form (infinity – infinity)without knowing thegraph of secant?Here is my question:

He evidently wants to be certain that the first term approaches negative infinity, so that the form is \(\infty-\infty\), without having to memorize the graph of the secant function. He has nicely used the graph of the cosine to show the limit of the secant: As *x* decreases toward \(\frac{\pi}{2}\), the cosine rises to 0. Finding the form is not the end of the problem, but is his main concern; we’ll finish the work after dealing with this initial stumbling block.

Doctor Fenton answered:

Hi Anwar,

I’m not quite sure what you are asking, but if you are asking whether you can say that

lim

_{(x→ π/2+)}sec x = -∞from knowing that

lim

_{(x→ π/2+)}cos x = 0

and that for π/2 < cos x < π, cos x < 0, I would say yes. In the interval [π/2,π], sec x = 1/cos x, so the fact that lim_{(x→ π/2+)}cos x = 0 means thatsec x will be a large negative number, unbounded, for values of x in the second quadrant near π/2, i.e. lim_{(x→ π/2+)}sec x = -∞.Then lim

_{(x→ π/2+)}{sec x – tan x} is alimit of the difference of two very large negative numbers, since tan x < 0 in the second quadrant also. The limit of the difference is an indeterminate form(-∞)-(-∞), which is the same as∞-∞. But changing the difference sec x – tan x into (1/cos x)-(sin x / cosx) and combining the two terms into (1-sin x)/cos x allows you to use L’Hospital on the resulting indeterminate form 0/0.Does that answer your question?

The fact, as shown in Anwar’s graph, that the cosine approaches 0 *from the negative side* is a key part of his work; the fact that the limit is 0 is not enough. It might be wise to write, not \(\displaystyle\frac{1}{0}\), but \(\displaystyle\frac{1}{0^-}=-\infty\).

There are other ways to determine this (in addition to just knowing the graph of the secant!), but his work shows good understanding.

Here are the graphs of \(\sec(x)\) (red) and \(\tan(x)\) (blue), showing how both approach \(-\infty\) as we approach \(\frac{\pi}{2}\) from the right:

One might assume that the limit is zero because it looks like the curves come together; but as we’ll be seeing, such limits can easily surprise us.

Doctor Fenton has also shown the start of the next step, converting from the form \(\infty-\infty\) to the form \(\frac{0}{0}\) so that L’Hospital can be used.

Anwar replied, showing another approach to the step he’s asked about:

Thank you Doctor Fenton for replying.

This is a powerful information I had forgotten.

Yes, you did answer me but I have to ask again to be sure.

Is my understanding below right?

This time, rather than use the graph of the cosine, he has used the unit circle, which is very useful for this sort of thinking. As the angle decreases toward \(\frac{\pi}{2}\), in the second quadrant, the cosine remains negative but approaches 0; so its reciprocal will be negative but becoming infinite, which we describe as approaching \(-\infty\).

But we should keep in mind that it is not really **equal** to \(-\infty\); the notation Anwar is using indicates only the **form** of the limit, and tells us we can’t yet evaluate the limit. That will be the task of L’Hospital (or an alternative).

Doctor Rick interjected a question and comment, observing that Anwar had titled the question “Limit Question About l’Hospital Rule”, and this form is not ready for the rule:

Hi, Anwar. While you’re waiting for Doctor Fenton’s response, I have a question for you: What do you intend to do

?nextSo far you have only established that the expression whose limit you want is of the form ∞ – ∞, so it is indeterminate.

How will you find its limit?I can think of two ways to solve the problem that doinvolve an expression of the form ∞ – ∞.not

We’ll see a couple of these later.

Doctor Fenton now responded to Anwar’s work:

Yes, your argument agrees with what I was saying.

You still cannot use L’Hospital

directlyon this expression, although you canchangeit to an equivalent expression to which L’Hospital applies, and as Dr. Rick points out, there areother waysto find the limit without using L’Hospital at all.

Anwar answered each. First, for Doctor Rick:

Hello Doctor Rick, thanks for replying.

I’m afraid that I know only one way to solve this limit, which I did by handwriting below, I’m afraid that I do not know a way to solve it which do not involve the form ∞ – ∞

This is in fact what Doctor Fenton had initially suggested; we can’t *ignore* the initial form \(\infty-\infty\), but can *change* it to another form, \(\frac{0}{0}\), and then use L’Hospital. After rewriting and confirming the form, which makes it suitable for L’Hospital, he differentiated the numerator and the denominator and found that the form is now \(\frac{0}{1}\), which implies that the limit is 0. Another way to express this would be to say that the new expression, \(\displaystyle\frac{-\cos(x)}{-\sin(x)}\), is **continuous** at \(\frac{\pi}{2}\), so that we can just replace *x* with \(\frac{\pi}{2}\) and get the limit.

The graph shows that the limit of the difference (green) is in fact 0:

At the end, we’ll be seeing methods that do not require L’Hospital; one of them is suggested by the shape of the green graph.

Then Anwar responded to Doctor Fenton with two new problems:

Hello Doctor Fenton, thanks for replying.

The doctor who is teaching me and will make the exam needs me to be very clear about the indeterminate form before solving the question, but this step is my main problem with L’Hospital Rule specially for two sided limits that do not involve a direction. I know the graphs of the simple functions and how to obtain the limit using the graphs, but

I do not know why in some cases in the textbook ln(0)=∞ and (1/0)=∞, the textbook do not give an explanation of how he did that, I hope you have the time to see my handwriting here:

Part of his confusion about the second problem may lie in the fact that he graphed \(\frac{1}{x}\) rather than \(\frac{1}{x^2}\) which is more relevant; the latter approaches \(+\infty\) because it is *positive on both sides of zero*, whereas his graph approaches different limits on each side, so that limit simply “Does Not Exist”.

In the first problem, the argument of each log is approaching 0 from above, and that one-sided limit is \(-\infty\).

Doctor Fenton replied at length, starting with the notation issue:

∞is not a number, so strictly speaking, a statement “1/0 = ∞” is meaningless. However,it can be a useful summaryof a certain situation. You ask whether 1/0 is always ∞ in limits. First of all, 1/0 is only supposed to indicate that we are dealing with a situation that involvesa number close to 1divided bya very small number. If the small number is always positive, then the quotient will always be a very large positive number, so 1/0 =∞just denotes this fact; while if the denominator is always negative, 1/0 would always be a very large negative number, so it would be better to write 1/0 as-∞; and if the denominator could be either a very small positive number or a very small negative number, 1/0 isn’t very useful: it will always be a very large number, but youcan’t predictfrom just this information whether that number is positive or negative.These indeterminate forms arise when you try to use direct substitution to evaluate the limit, such as when you evaluate lim

_{(x→2)}2x+3 = 7 by direct substitution.Often, such “direct substitution” gives undefined expressions 0/0 or ∞/∞, which again simply indicates a need for more analysis.

Using the notation Anwar is using, it would be appropriate to write \(1/0^+=+\infty\) or \(1/0^-=-\infty\), when possible.

The last comment is akin to what is said in Division by Zero and the Derivative, that an indeterminate form like these is “a sign saying ‘bridge out – road closed ahead’, that forces us to take a detour to get to our goal.”

Looking at the first new problem, he says,

You ask why the limit

lim

_{(x→1+)}[ ln(x^{3}-1) – ln(x^{5}-1) ]is written as ∞-∞. If x is a number close to 1 but

slightly larger than 1, x^{3}-1 and x^{5}-1 are bothpositive numbers close to 0, so ln(x^{3}-1) is the logarithm of a positive number close to 0, and you know that as y→0 from the right, ln(y) has avery large negative value, which we indicate with a symbolic statement ln(y) ≈ -∞, or in your case, ln(0) = -∞. It would be more accurate to write something likeln(0^{+}) = -∞, so it would be better in my opinion to writelim

_{(x→1+)}[ ln(x^{3}-1) – ln(x^{5}-1) ] = ln(0) – ln(^{+}0) =^{+}-∞ – (-∞).But this means the same thing as ∞-∞, i.e. the

difference of two very large quantities with the same algebraic sign. All that does is to alert you to the fact that a more careful analysis is needed.

So this tells us that we will need to use L’Hospital’s rule, or some other alternative. If the form had turned out to be \(\infty+\infty\) or \(-\infty-\infty\), the limit would just be \(\infty\) or \(-\infty\).

Here is the graph of the two logarithm functions:

This looks much like our first graph; do you think the limit will again be 0? Let’s find out. First, we can simplify:

$$\lim_{x\to1^+}\ln\left(x^3-1\right)-\ln\left(x^5-1\right)=\lim_{x\to1^+}\ln\left(\frac{x^3-1}{x^5-1}\right)$$

We’ve transformed the limit into the log of an expression of the form \(\frac{0}{0}\), so we can apply L’Hospital’s rule to that:

$$\lim_{x\to1^+}\frac{x^3-1}{x^5-1}\overset{L’H}=\lim_{x\to1^+}\frac{3x^2}{5x^4}=\lim_{x\to1^+}\frac{3}{5x^2}=\frac{3}{5}$$

So $$\lim_{x\to1^+}\ln\left(x^3-1\right)-\ln\left(x^5-1\right)=\lim_{x\to1^+}\ln\left(\frac{x^3-1}{x^5-1}\right)=\ln\left(\lim_{x\to1^+}\frac{x^3-1}{x^5-1}\right)=\ln\left(\frac{3}{5}\right)\approx-0.5108$$

Here I’ve added the graph of the difference (in green), which agrees:

Without the work, though, the graph might suggest it’s exactly 0.5!

We could also solve this without L’Hospital, by simplifying further:

$$\lim_{x\to1^+}\ln\left(\frac{x^3-1}{x^5-1}\right)=\lim_{x\to1^+}\ln\left(\frac{(x-1)(x^2+x+1)}{(x-1)(x^4+x^3+x^2+x+1)}\right)=\\\lim_{x\to1^+}\ln\left(\frac{x^2+x+1}{x^4+x^3+x^2+x+1}\right)=\ln\left(\frac{1+1+1}{1+1+1+1+1}\right)=\ln\left(\frac{3}{5}\right)$$

Moving to the second problem,

Similarly, if you try to use direct substitution for

lim

_{(x→0)}[cos(x)]^{(1/x^2) },the result is 1

^{∞}, or better yet1, since that just means^{+∞}a number close to 1 raised to a very large positive exponent. ∞ does not represent a specific value, so these are just symbolic statements.

We’ll come back to this after examining the general concepts.

When we write an ordinary limit statement, lim

_{(x→a)}f(x) = L, we are saying that if we evaluate f(x) for a value of x which isvery close to a, the value of f(x) will bevery close to L. In the statementlim

_{(x→0+)}1/x = ∞ ,we are actually saying that “the limit of 1/x as x approaches 0 from the right (i.e. through positive values)

does not exist: that is, there isno finite number Lsuch that 1/x will approach L more and more closely as x approaches 0 from the right. However, the behavior of the function ispredictable, because as x gets closer to 0 from the right, the values of 1/x becomelarger and larger positive numbers, which will exceed all bounds.”Ordinary limits indicate

predictable behavior. lim_{(x→2)}2x+3 = 7 means that as I evaluate 2x+3 for values of x getting closer and closer to 2, the values will be closer and closer to 7. So while lim_{(x→0+)}1/x does not exist, the symbolic statementlim_{(x→0+)}1/x = ∞does provide useful information about the behavior of the function (1/x) as x→0^{+}: the function valuesbecome larger and larger positive numbers.

When we call a limit “infinity”, we are saying a lot! But we are not saying that the limit is a number called infinity!

This is quite different from situations where there is no limit at all:

You can compare this situation to trying to find lim

_{(x→0+)}sin(2π/x)/x. In every interval [1/(n+1),1/n], the numerator sin(2π/x) goes through a completeoscillation: going left from x = 1/n, the function goes up to 1 at x = 1/(n+(3/4)), back down to 0 at x = 1/(n+(1/2)), on down to -1 at x = 1/(n+(1/4)), and back up to 0 at x = 1/(n+1). Meanwhile, the amplitude of the oscillation is steadily increasing, so that the values in this interval can be anywhere between -(n+1) and n. The values of sin(2π/x)/x arecompletely unpredictablefrom the knowledge only that “x is a positive number close to 0”.

Here is a graph of this function, showing the interval \(\left[\frac{1}{n+1},\frac{1}{n}\right]\) for \(n=1\) in blue:

Similar oscillations repeat over and over as we approach zero, getting narrower and taller, so that the function approaches no value (or every value!).

Indeterminate formsarise because there is sort of a “competition” between different parts of a formula. For example, in a limit lim_{(x→a)}f(x)/g(x),

- if
f(x) approaches 0 while g(x) approaches a finite valueL, the fraction is being driven toward 0 by the small numerator, while- if
f(x) approaches a finite positive limit L while the denominator g(x) approaches 0(taking only values of one sign, for example, all values of g(x) are positive), then the fraction takes larger and larger positive values.It is useful to describe this by saying that lim

_{(x→a)}f(x)/g(x) = +∞, so that we know that the graph of f(x)/g(x) has a vertical asymptote, and the function goes off to +∞.If f(x) and g(x) both approach 0, the result

depends upon the relative sizes of the small quantities:

- if f(x) = x
^{3 }and g(x) = x^{2}, f(x) “wins” anddrives the fraction towards 0, while- if f(x) = |x| and g(x) = x
^{2}, g(x) “wins” the competition to approach 0, anddrives the fraction to “+∞”(i.e. larger and larger values without any bound on how large they can be).

So the fact that both numerator and denominator approach 0 sets up a competition in which either might “win”. What if the competitors are equal?

But if f(x) = 3x

^{2 }and g(x) = x^{2}, then the competition is astalemate, and the fraction approaches afinite limit, 3. Writing the limit lim_{(x→a)}f(x)/g(x) as 0/0 simply indicates that the numerator and denominator are both approaching 0, and more analysis is needed to determine what happens, what the relative sizes of the two functions. The same is trueif both f and g have a vertical asymptote: which function becomes relatively larger than the other, or are they comparable (i.e. one function becomes approximately a constant multiple of the other)?Similar situations arise with other expressions, leading to other indeterminate forms.

∞-∞means that two functions are both becoming unbounded, but have the same sign, so their values at least partly cancel. Does one function’s values dominate the other function’s (e.g.x^{2}-x, x-x^{2}), or are they more comparable (e.g. (x^{2}+arctan x) – (x^{2}-arctan x) )?

If you have an exponential

f(x), where f(x)→1 and g(x)→∞, if f(x) > 1, then since 1^{g(x)}^{y }= 1 for any real y, the base approaching 1 tend to drive the exponential to the value 1, but raising a number larger than 1 to a large exponent tends to make the exponential large, so again there is a competition: the base approaching 1drives the exponential to 1, while the large exponent candrive the exponential to +∞. This situation is indicated by the indeterminate form1.^{∞}

Whereas 1 raised to any (finite) power is 1, and any (finite) number greater than 1 raised to a very large power is very large (while a number *less* than 1 raised to a large power approaches 0), raising 1 to an infinite power (in the form of a limit) might be anything.

A good example of this form is the limit $$\lim_{n\to\infty}(1+\frac{1}{n})^n=e,$$ or equivalently $$\lim_{x\to0}(1+x)^\frac{1}{x}=e,$$ which both have the form \(1^\infty\).

Let’s work out that limit, \(\lim_{x\to0}[\cos(x)]^\frac{1}{x^2}\). The trick here is the opposite of the last one: We’ll take the log of our function, which we can express by writing the function in exponential form:

$$\lim_{x\to0}[\cos(x)]^\frac{1}{x^2}=\lim_{x\to0}e^{\ln[\cos(x)]^\frac{1}{x^2}}=\lim_{x\to0}e^{\frac{1}{x^2}\ln\cos(x)}=\lim_{x\to0}e^{\frac{\ln\cos(x)}{x^2}}$$

The exponent is now a fraction in the form \(\frac{0}{0}\), so we can apply L’Hospital’s rule to it:

$$\lim_{x\to0}\frac{\ln\cos(x)}{x^2}\overset{L’H}=\lim_{x\to0}\frac{\frac{-\sin(x)}{\cos(x)}}{2x}=\lim_{x\to0}\frac{-\sin(x)}{2x\cos(x)}$$

This still has the form \(\frac{0}{0}\), so we repeat:

$$\lim_{x\to0}\frac{-\sin(x)}{2x\cos(x)}\overset{L’H}=\lim_{x\to0}\frac{-\cos(x)}{2\cos(x)-2x\sin(x)}=\frac{-\cos(0)}{2\cos(0)-2(0)\sin(0)}=-\frac{1}{2}$$

Now we can insert that back into our exponential form:

$$\lim_{x\to0}e^{\frac{\ln\cos(x)}{x^2}}=e^{-\frac{1}{2}}\approx0.6065$$

Here is the graph, showing the power in green, approaching the limit at 0:

In your solution of the problem of

lim

_{(x→π/2+)}sec x – tan x ,you used the approach of combining 1/cos x and -sin x/cos x into a single fraction, which gives an indeterminate form 0/0, to which L’Hospital’s Rule applies. (You must always convert indeterminate forms into either 0/0 or ∞/∞ in order to apply L’Hospital.) But in this case,

you can also just “rationalize”and multiply (1-sin x)/(cos x) by (1+sin x)/(1+sin x) and simplify, leading to a fraction which does not give 0/0 by direct substitution,avoiding L’Hospital.

Here is the work: $$\lim_{x\to\frac{\pi}{2}^+}\frac{1-\sin(x)}{\cos(x)}=\\\lim_{x\to\frac{\pi}{2}^+}\frac{(1-\sin(x))(1+\sin(x))}{\cos(x)(1+\sin(x))}=\\\lim_{x\to\frac{\pi}{2}^+}\frac{1-\sin^2(x)}{\cos(x)(1+\sin(x))}=\\\lim_{x\to\frac{\pi}{2}^+}\frac{\cos^2(x)}{\cos(x)(1+\sin(x))}=\\\lim_{x\to\frac{\pi}{2}^+}\frac{\cos(x)}{1+\sin(x)}=\\\frac{\cos\left(\frac{\pi}{2}\right)}{1+\sin\left(\frac{\pi}{2}\right)}=\frac{0}{2}=0$$

Doctor Rick had mentioned two ways to solve it “that do not involve an expression of the form ∞ – ∞;” he may have meant the two methods we’ve seen (both involving the same transformation), or he may have had something like this in mind: $$\lim_{x\to\frac{\pi}{2}^+}(\sec(x)-\tan(x))=\\\lim_{x\to\frac{\pi}{2}^+}\frac{\sec(x)-\tan(x)}{1}\cdot\frac{\sec(x)+\tan(x)}{\sec(x)+\tan(x)}=\\\lim_{x\to\frac{\pi}{2}^+}\frac{\sec^2(x)-\tan^2(x)}{\sec(x)+\tan(x)}=\\\lim_{x\to\frac{\pi}{2}^+}\frac{1}{\sec(x)+\tan(x)}=0$$ because the denominator becomes (negatively) infinite.

For yet another approach, you may be reminded of the half-angle formula $$\tan\left(\frac{x}{2}\right)=\frac{1-\cos(x)}{\sin(x)}$$ Replacing *x* with \(\frac{\pi}{2}-x\), we see that $$\frac{1-\sin(x)}{\cos(x)}=\tan\left(\frac{\pi}{4}-\frac{x}{2}\right)$$ So $$\lim_{x\to\frac{\pi}{2}^+}(\sec(x)-\tan(x))=\lim_{x\to\frac{\pi}{2}^+}\tan\left(\frac{\pi}{4}-\frac{x}{2}\right)=\tan\left(\frac{\pi}{4}-\frac{\pi}{4}\right)=0$$

Problem 2 above would be much harder to do without L’Hospital.

]]>We’ve looked at the concept of limit of a function from several perspectives, including why they are needed, and what the definition means. Here we have a more fundamental question, which applies to both functions and sequences: What do we mean when we say a value **approaches** some number? More broadly, we’ll see how mathematicians have to rein in language by refining their definitions, and replacing words with symbols. This will be a long one, as we seek a helpful answer.

Anuj sent this question in mid-July:

Why

approaching a valuemeans that we can getarbitrarily close to that value?Given two sets:

x = {3, 2.5, 2.04, 2.03, 2.02, 2.001, 2.0001, …}

y = {4, 6.25, 5.76, 4.25, 4.025, 4.001, 4.00001, .…}As terms of the set x get closer and closer to 2, terms of the set y seem to be getting

closer and closer to4. But terms of setxalso seem to be gettingcloser and closer to “values< 4“like 3.999, 3.8999, etc. Then why do we say the 4 is the value the terms of setyseems to be getting closer and closer to rather then some value <4?And also, why do we need to check for

arbitrary number of termsto prove the existence of a value the terms of a set are getting closer and closer to, rather than just looking at abounded (finite) number of termsof set? Why can’t we prove the existence of a value the sequence is getting closer and closer just by checkingfinitenumber of terms of a sequence rather than looking atarbitrarynumber of terms of a sequence?So, my question is that why approaching a value means that we can get arbitrarily close to that value, i.e., why getting

closer and closerto a value means that we can getas close as we wantto that value?In book

by Stephen HewsonA Mathematical Bridge An Intuitive Journey in Higher MathematicsAristotle: I see: all the terms in the sequence

beyond a certain pointare closer to ½ thanany number I can specify. If I were to say that the limit were anything other than exactly one half, then eventually all of the terms beyond some point in your sequence would get closer to one half than they would to my other supposed limit. I thus concede the point: the sequence must tend to the limit of ½.Here he is saying about the fact that a sequence gets closer and closer to many values until a certain point but diverges after this, it can’t be considered a limit value, but it gets closer and closer to ½

after any arbitrary point: that’s why it is the limit value. Can you explain to me why we must get closer and closer after any arbitrary point to consider it its limit value?I also want to know

what does approaching meansand howepsilon and deltaexplain this idea of approaching a value?

At one level, the initial question could be just about language: “Approach” can mean merely **“moving in the direction of”**, so that his sequence (inaccurately called a “set’) of *y* “approaches” not only 4, but any number less than 4 (as one might “approach” a throne without going all the way up to it); but in this context we take it to mean **“coming very close to”** one specific number. Mathematics chooses a particular meaning of a word and makes it precise; in particular, we don’t really use “approach” as a technical term in itself, but use it only to informally explain what the formal definition (with epsilons and deltas) means.

Do, why do we want “approach” to mean this, and how does the formal definition accomplish it?

Doctor Fenton answered, starting at a high level, likely because Anuj had mentioned “real analysis” in his title:

Hi Anuj,

Those are very good questions, and they lead to some very deep concepts.

As for

what it means to “approach” something, we have the idea ofdistanceon the number line (and more generally in sets such as the Euclidean plane or space). A distance (called ametricin math) is a function which assigns a unique non-negative number between each pair of elements of the set, with two requirements: (1) the positivity that I mentioned, i.e. if d is our metric, d(x,y) ≥ 0 for all x and y in the set; if x and y are not equal, then d(x,y) > 0, so if d(x,y) = 0,then x = y; and for any three elements x, y, and z,d(x,y) ≤ (dx,z) + d(z,y) (the triangle inequality).

For the real numbers, the usual

metricor distance between two numbers a and b is d(a,b) = |a-b|, the absolute value of the difference between a and b.

This is a formal definition of what a “distance” means; “approaching” says something about distance. But what, exactly?

You are talking about the mathematical idea of limits, and there are

two different concepts of limits in math: limits ofsequences; and limits offunctions. These are closely related, but very different in some ways. Mathematically, a sequence is a function from the positive integers to the real numbers, while functions can have other domains. For simplicity,let’s stick to functions defined on a subset of the real numbers whose values are also real numbers.

Since positive integers are a subset of the real numbers, this restriction includes sequences; we will not go beyond real numbers here, though one could.

Now, to say that the numbers in a sequence, {a

_{n}}, are “approaching” a number A means that thedistance betweena_{n}and A, |a_{n}-A|, gets close to 0, as the index n gets larger. (We also have to use the order property of real numbers, which allows us to say that for any two different numbers a and b, one number is less than the other: either a < b, or b < a.)For sequences, we want to say that a

_{n}approaches A if |a_{n}-A| approaches 0as n approaches ∞, but “n approaches ∞” is a different kind of “approaching”. n approaches ∞ doesn’t mean that n keeps getting closer to another number, it just means thatn keeps getting larger and larger, so that there there is no real number which is never exceeded.

Infinity is tricky, so “approaching infinity” needs its own definition!

We still have some clarifying to do! What does “gets close to” mean? It’s not as simple as you might think:

For the idea of a sequence of numbers a

_{n}approaching a finite value A,the method of approaching can be quite complicated. a_{n}could steadily get closer to A, as n increases, such as the sequence a_{n }= 1-(1/(n+1)): a_{1 }= 1/2, a_{2 }= 2/3, a_{3 }= 3/4, a_{4 }= 4/5, and so on.Each number is steadily closerto A = 1: |a_{n}_{+1}-A| < |a_{n}-A|. But we can also create sequences which for one index n get closer to A, but for a larger index may get further away, such as the sequence 2/3, 1/2, 4/5, 3/4, 6/7, 5/6, 8/9, 7/8, 10/11, and so on. The subsequence of odd elements of the sequence, 2/3, 4/5, 6/7, 8/9, … increase steadily towards 1, and so do the even elements 1/2, 3/4, 5/6, 7/8,…, butthe whole sequence does not approach 1 steadily.

Here is what the first sequence lookw like:

Here each term is closer than the one before.

Here is the second sequence:

This alternates between two increasing sequences, though it could be expressed with a single formula, namely $$\frac{n-(-1)^n}{n-(-1)^n+1}.$$ The terms go back and forth, moving alternately closer and further, yet overall they approach 1. We’ll need a way to define “approach” that fits that!

We could also have a sequence that approaches 1 from both sides:

Here the *distance* from 1 decreases (or stays the same) at each step, though the *direction* alternates.

But all three of these approach 1 as *n* approaches infinity. So our understanding of “approaching” has to work for all of them.

Then, there is the idea of “closeness”, when we say that a

_{n}is gettingcloseto A as n increases. “Close” is a vague concept, and its meaning depends upon the context. We describe asteroids that pass closer to the earth than the moon as being a “close approach”. The moon is about 400,000 km from earth, so an asteroid passing within 300,000 km of earth is “close”. But if lightning is striking 100 km from your house, you probably don’t consider that “close”, but if it is less than 1 km, you are probably concerned.To have meaning in a given context, we need a standard of “closeness”.In mathematics, that (arbitrary) standard of closeness is denotedε (epsilon). Now we can say with precision whether a_{n}is close to A or not:a_{n }is close to A if |a_{n}-A| < ε, otherwise it is not close.

We need to be getting *close*, not just *closer*. Since “close” is a relative concept, it has no universal meaning; rather, whatever one chooses to call “close enough” is considered close *for the moment*. That is what epsilon does.

To get to the main point, saying that lim

_{(n→∞)}a_{n}= A is saying thatif n is large enough, thena_{n}will be close to A. To make sense of this, we need to choose astandard of closeness to A, ε, and astandard of largeness for n, N, so that the statement “if n is large enough, then a_{n}will be close to A” can be given a precise meaning.Given ε, there issome Nsuch that if “n islarge enough“, i.e. n > N, then a_{n}will be “close” to A, |a_{n}-A| < ε.For a

functionlimit, lim_{(x→a)}f(x)=L, we want to say that if x is close to a, then f(x) will be close to L. That means thatfor any given standard of closeness to L, ε,there will be a standard of closeness to a, δ, so that if |x-a| < δ, then |f(x)-L| < ε.Does this help?

The sequence must not only get *closer*; it must get *as close as you want* – that is, there is no limit to how close it gets. (No pun intended. Well, maybe.)

This is a brief explanation of the epsilon-delta formulation of limits, which is explored in more detail at Epsilons, Deltas, and Limits — Oh, My!

Anuj wasn’t satisfied yet, and restated his question:

Sorry but, I do not understand your answer. I meant to ask as stated above about

what is approaching, is it just somethinggetting closer to something until a certain pointand how epsilon and delta formalize this notion of “approaching” something arbitrarily as limit and alsois limit just a special case of approaching, where limit is just something toward which approach arbitrarily close.

Doctor Fenton tried again:

The idea of approaching in math is

the same as in daily life. You get closer to something. If you are driving from one town to another, you will approach the destination,the distance between you and the destination is decreasing.

Note that this wording suggests *always* decreasing, which is not necessarily true in talking about limits. Rather, we mean that *eventually* the distance gets closer, and stays there.

When this is used in math, we are usually talking about an input/output relationship. There is some input which produces an output. You control the input, so

you make the input approach something. For a sequence, you choose the index n, so you choose how far out in the sequence to go. For a function, you want the input to approach a number a, so again, you choose how to approach a. The value of x is what you choose, so you can choose how to approach a. If you choose to make x go away from a, then x won’t be approaching a. You can think of a point sliding along the graph of the function,getting closer and closer to the desired value a. You control the value of x, so you can make it approach a or not approach a.The claim that

lim

_{(x→a)}f(x) = Lclaims that the “closeness” of x to a

causes the closeness of f(x)to the limiting value of L.

So we can move the input (*x* or *n*) toward the goal (*a* or ∞), and then look at how the output (\(f(x)\) or \(a_n\)) approaches the expected limit, which we don’t have direct control over.

An example I like to use is to suppose that you are sitting at a computer where you can type inputs, and the computer displays an output. Other people will give you a number, and you can choose to type the number into the computer or not. You control the input. Suppose that the computer will not allow you to input the exact value 3, but only numbers close to 3.

An output inspector looks at the output to decide if it is “acceptable”.You are claiming that if the input is close to 3, then the output will be close to 9. There are two standards of closeness being used: thecloseness of the given number xto the desired input 3, which will be denoted byδ, and which you will use to decide whether to type the number into the computer, and thestandard of closeness the output inspector will use, denoted byε, to decide whether its output is acceptable.Can you guarantee that an input you allow will always produce an acceptable output?If you tell the output inspector what input standard you are using, and

if the inspector wants you to fail, then she can always guarantee that you will input a value that will produce unacceptable output. The inspector just needs to find a value which meets the input standard whose output does not meet the output standard.

This idea of an antagonist trying to make you fail is useful in thinking about proofs. To show that you can *always* succeed (something is always true) you need to show that an enemy can’t *force* you to fail. Here, if the enemy knew your standard of closeness, then she could choose an output standard by which some inputs you would permit, would not work.

For example:

Suppose the output inspector knows that the computer produces the output by doubling the input and adding 3. That is, if x is the input, then the output will be 2x+3. If you are given the number x, then you will determine the “input error” |x-3| and compare it to δ. If |x-3| < δ, you type it in. The “output error” will be |(2x+3)-9| = |2x-6| = 2|x-3|. That is, the output error will always be twice the input error. If you tell the output inspector that δ=0.1, then she knows that any input x between 2.9 and 3.1 will be allowed to go into the computer. But if 2.9 < x< 3.1, then 8.8 < 2x+3 < 9.2. If the inspector chooses ε = 0.1, then if the input error is larger than 0.05, the output error will be larger than 0.1, so the output will not be acceptable. For example, if x = 3.06, |x-3| = |3.06-3| = 0.06 < 0.1, so x = 3.06 passes the input standard, but the output 2(3.06)+3 = 9.12, which is unacceptable because |9.12-9| = 0.12 > 0.1.

So the decision about what counts as close must be *first* made by the inspector (closeness of the output); we can’t let the inspector choose epsilon knowing our delta!

So this is why epsilon comes first in our definition of a limit:

On the other hand,

if the output inspector tells you ε, then since the output error is twice the input error, if you choose your input standard δ so that 2δ ≤ ε, then any input x meeting |x-3| < δ will produce an output whose error is |(2x+3)-9| = 2|x-3| < 2δ ≤ ε .That is, no matter what output standard the inspector chooses,

you can ALWAYS find an input standard δwhich will ALWAYS produce acceptable output.

This is the key to the epsilon-delta definition of a limit, though this is an unusually simple example.

You always have to consider relationships where you have control of the input, so the question is

whether your ability to control the input will allow you to control the output. The function f(x) whose output is 1 when x is irrational and 0 when x is rational, at any point a, if ε < 1 (e.g. ε=0.1) , then no matter what restriction you place on x, |x-a| < δ, i.e. x is in the interval a-δ<x<a+δ, there will be points in (a-δ,a+δ) where f(x) = 0 and points where f(x)=1, so f cannot approach any fixed value L.Does this help?

That is an example where the limit does not exist, because the output varies too wildly.

Anuj answered,

I think I understand but just to clarify:

approachingmeans getting closer to something until a certain point andlimitis just a special way of approaching where the value we approacharbitrarily closeis the limit and I read that this is how limits were discovered, that is the limit is defined by the value we get arbitrarily close to because that is something that will help find derivative at a point. Is this the correct interpretation of limit as the value we approachuntil a certain pointand that is arbitrarily close, is that it?

Approaching arbitrarily closely is indeed to key idea: “Approaching” here doesn’t just mean going in that direction, but getting as close as the “output inspector” chooses to require.

Doctor Fenton replied:

I don’t understand what you mean when you say “approaching means getting closer to something

until a certain point“. What does “until a certain point” mean?There are two approaches occurring: in the limit of a sequence a

_{n}, the index n “approaches ∞” while the numbers a_{n}approach A. That is a “dynamic” description: we think of either theindex movingalong a list of the positive integers in the sequence case, while thea_{n}movealong the real axis, orx movingalong the x-axis andy movingalong the y-axis in the function case. (n approaching ∞ doesn’t mean that n is approaching any specific number, it just means that n steadily gets larger). That is, no matter how large a positive integer N we choose, the index n will get larger than N, n > N, and stay larger than (once n is larger than N, all indices larger than n will also be larger than N).without bound

Here is our sequence \(b_n\), seen as moving to the right (*n* “approaching” infinity) and, irregularly, up toward the value \(A=1\):

Here, \(\epsilon=0.22\), and we see that for \(n>4\), \(|b_n-1|<\epsilon\). But it isn’t moving constantly up toward 1; in fact, it moved inside the green region, then back outside before staying in.

The same sort of thing would be happening with a function, as *x* approaches *a*. It might wiggle, but \(f(x)\) will always be coming within whatever bounds you choose.

It’s hard to describe this dynamic motion directly, and that is what the“arbitrarily” closeis meant to capture. Instead of describing some type of moving, we use the arbitrariness of the standard of closeness instead. You can think of “approaching” as describing havinga sequence of such standards, each more restrictive. For example, the first standard will be ε_{1}, the second standard ε_{2}, and so on, with each standard smaller than the previous one: ε_{1 }> ε_{2 }> ε_{3 }> ε_{4 }> … (and this sequence approaches 0). To say that a_{n}approaches a limit A means thatfor each standard ε, there will be an index N_{k}_{k }such that forallindices n such that n > N_{k}, a_{n}will be in the interval (A-ε_{k},A+ε_{k}). That is, n = N_{k }may not be the first time a_{n}gets inside the interval (A-ε_{k},A+ε_{k}): it could have “two steps forward, one step back” behavior, moving closer to A, but then backing away, but there will be an index N_{k }which will be the last time a_{n}is outside the interval. After that, all the a_{n}will be inside the interval. Then we reduce the size of ε to ε_{(k+1) }and find N_{k+1 }and repeat that process.That is what is meant by “approaching (arbitrarily) close”. We don’t have to talk about something “moving”.

Note how the idea of “approaching” had to be refined in order to define it clearly, and in doing so it has lost the literal idea of motion.

This idea applies to relationships: for a

sequence, the relationship between the size of the index n and the size of the sequence value a_{n}; forfunctions, the size of x and the size of y=f(x).

Anuj wrote back,

OK, I understand now.

Thank you for your assistance.

It took well over a hundred years for the concept of limits to develop as a justification for what we do in calculus; we can see why the concept is difficult!

]]>Real life questions of probability often require information that we don’t have – they become a job for statistics instead. But sometimes just trying some plausible numbers, as in a Fermi problem, can yield interesting results. Here we consider the probability of an injury when kids play near a baseball field. Then we’ll jump from that to an older question about masks and COVID, using actual data. In both cases, we end up looking at long-term risks.

The first question I want to look at came in late June:

Hi,

We have a

play parkright behind ourbaseball field.The play park is about

16 000 sq ft.The city wants to close the baseball field because they are afraid a child will get hurt.

What are the possibilities of a softball hitting a child in this area with a home run?

I’d say a

childoccupy 4 sq ft and asoftballmight be .10 sq feet?

To visualize this, suppose that the baseball field extends about 300 feet in each direction from home plate at the lower left corner (bases being 90 feet apart), for a total of 90,000 square feet, and the play area measures 300 by a little more than 50 feet. Here is a possible layout:

We don’t know the actual shape or placement of the play area, so this is just conceptual. I didn’t make such model before answering, and in fact envisioned the play area being closer to the size of the ballfield, with only part being near the outfield. What I’ve drawn here is perhaps the worst case.

I answered, skeptical of the whole question:

Hi, Michael.

A question like this really can’t be answered without statistical data, such as

how many childrenare in the park during a typical game, thedistribution of locationswhere home runs will land, and more.

These missing details are very important in reality. If the play area were packed solid with children, and if batters habitually hit home runs over the fence, the situation would be far worse than if there were one kid sitting at the far corner, and children batting who could barely hit into the outfield!

I started out just trying to demonstrate these ideas by making up some totally arbitrary facts that would be somewhat plausible:

For example, suppose that during a game you can expect

10 home runsthat would land in the play area, which would beuniformly distributed over half of that area(the other half being too far away). And suppose that there are20 childrenuniformly distributed over the entire play area during the game.Then we could estimate (roughly) that children take up about 20 4-ft

^{2}patches out of 16 000/2 = 8000 ft^{2}, consisting of 8000/4=2000 such patches, on which a ball might land. So the probability thatwould land on a child would be 20/2000 = 0.01 =any one ball1%.

The assumption of half the play area being too far away to be hit is not reasonable based on my drawing, but seemed so at the time; and I mostly wanted to illustrate the idea of a (non-uniform) distribution. The suggested size of a child is probably too large based on a top view, but perhaps too small considering the ball would be coming in at an angle. My calculations suggest that children might take up about 1/2000 = 0.05% of the area the ball might hit, so that each ball that reaches the play area has that much chance of hitting one. With each child on a different spot, the probability of the ball hitting someone is \(20\times0.05\%=1\%\). (It was entirely by chance that my randomly chosen numbers worked out so nicely, but that motivated me not to adjust them!)

Out of 10 such hits, the probability that

would hit a child would be 0.99^10 ≈ 0.90 = 90%. So the probability of a child beingnoneduring a given game would be 10%. If there were 100 games during a season, the probability ofhitchild being hit becomes 0.90^100 = 0.000043, and the probability ofnobeing hit would be 0.999957 – almost certain.at least one

That 1% sounds like a small probability; but for something you really, really don’t want to happen, you need to take into account how many opportunities there are. My proposals of 10 long hits per game, and 100 games per season, are probably too high, but … it made a point. (With only 5 hits into the play area per game and 50 games per season, the probability becomes \(1-0.99^5=0.049=4.9%\) per game, and \(1-0.951^{50}=0.919=91.9%\) for the season, which is still significant.)

I didn’t do this as a serious answer to your question, because my assumptions are intended only as wild guesses, and probably contrary to fact; but it interestingly demonstrates that the

probability can increase considerably as the number of opportunities increases!I was expecting a small chance, but instead I think I’ve shown why they would want to be cautious. Not allowing games when children are present, or not allowing children during games, might be very reasonable, IF there are many hits into a large enough part of the play area.

Those are the sort of statistics you’d need to gather.Has a hitevergone into that area in the past? How often?

Questions of safety can be tricky. The real probabilities may well be too low to be a major worry, but we’ve seen enough to want to be cautious, and gather the additional data – or just make small changes to the rules.

Michael didn’t reply, but did indicate that the answer was helpful.

That question about risk reminds me of a question from last December, from Laura in Germany:

I am wondering about the probability of getting infected with COVID when you wear a mask. A recent study said that if you wear a good mask but it’s not tight around your nose,

the probability of getting infected if you are with an infected person for 20 minutes is 4%. (There are more details but that would make this too long). That seems like asmall riskto me. But my husband said I have toadd the risk for every timewe meet people. So if we go shopping once a week it would be 4% times 52 weeks after a year. Avery big risk. That seems crazy to me. But he’s a scientist so he’s probably right. Can you clarify?

This gives us another chance to do the same kind of calculation we did above, and to dig in a little deeper: How do you find the long-term risk when you know the risk for each exposure? How much is it affected by changes in parameters?

This time we have real statistics to start with.

I answered:

Hi, Laura.

The details you are ignoring may change the ultimate conclusions considerably; but I’ll just look at your specific question.

We will

assumethateach weekyou spend enough time close enough to one infected person that the probability that you will catch the infectionon that occasionis 4%.Your husband is wrong about the specific calculation to determine the probability of being infected in one of 52 such incidents;

you can’t just add probabilities. (In particular 52*4% = 208%, which is impossible as a probability! The highest a probability can be is 100%, which is 1.)But the probabilities do “add up”, figuratively.

The calculation he suggested was a rough first estimate, which we can fix:

Here’s the correct calculation (again, assuming the data are correct):

The probability that you will get infected

is 1 minus the probability that you willat least oncebe infected onnotof the occasions.anyThe probability that you will

notbe infected onoccasion is 1 – 0.04 = 0.96 (that is, 96%). The probability that this will happenany oneis (0.96)52 times in a row^{52 }= 0.1197.So the probability that you will be infected

at least once in the yearis 1 – 0.1197 = 0.88 (that is,88%).

Let’s turn this into a general formula. Suppose that the probability of “success” (which in our case means getting sick!) at any one “trial” is \(p\). Then the probability of “failure” is \(1-p\); and if the trial is repeated \(n\) times, the probability of failure every time is \((1-p)^n\). The probability of **at least one** success is the complement of **no** success, namely \(1-(1-p)^n\). Here is a graph of this in our case, with \(p=0.04\), as \(n\) increases up to 104 (two years of weekly events):

The blue line shows what would happen if we just multiplied by \(n\), using the naïve guess about the effect of repeated exposure. Reality looks quite different, but still rises inexorably.

So, yes,

the probability increases with each exposure; and it does eventually get quite large. If you had provided the article, I might be able to check out other mathematical issues in it, but I’m sure we both know there are a lot of complexities to be kept in mind. In particular, I doubt that you spend 20 minutes that close to any one person at the store. But that’s not my area of expertise.

We’ll see the article in a moment.

Laura replied:

Thanks so much. I can send a link to the study if you are interested but it’s in German (at least the link I have). I agree that in reality the risk would be much smaller. In the study itself, they say that

under real conditionsthe risk would be 10 to 100 times smaller!I was interested in the risk calculation in theory. I didn’t realize one needed to add them at all. That applies to all areas of life, then. Driving, for example. Kind of scary.

So the study’s numbers are very conservative, even given the assumption of spending 20 minutes with a contagious person. We should be concerned, but not terrified (as long as we are masked!). And, yes, that’s true for all of life. *Everything* we do has *some* risk, and if we do anything often enough, the risk adds up. We have to make decisions based on how serious the consequences are (for ourselves and others), and avoid doing risky things too often.

I answered:

Yes, risks “add up” when you do

enough times. Though it isn’t literal addition, it does grow.anythingOn the other hand, the same can be said of

!benefitsJust to take this one step further, if the risk was really 0.4% per occasion, 1/10 as much, then after 52 times, the total risk would be 1 – (1 – 0.004)

^{52 }= 0.188 = 18.8%.It could be interesting to see the study, though there are enough of them around. If you feel like it, send me the link and I’ll at least let Google translate it for me (though I did take German in college many years ago, and so did my daughter).

Here is the graph for this 0.4% risk for a poorly fitting mask:

At such a small probability, the increase is much closer to linear (that is, to literally adding up); after two years, it is still less than 35%. This illustrates how modifying the assumptions can affect our results.

Now think about what I said about the **benefits** also adding up. Since wearing a good mask reduces the probability of getting sick on one occasion considerably, it is even more valuable over multiple exposures, and more so when more people wear them. So is vaccination. Every measure you take helps. In particular, I’m reminded of the Swiss Cheese Model.

Laura wrote back:

Hi Doctor Peterson. I figure I can just summarize the study in a few sentences. (By the way: I live in Germany which is why I am reading studies in German). This one is from the Max Planck Institute (Eberhard Bodenschatz). They say that if you wear a FFP2 or KN95 mask (I don’t know if you call them that in the US), and you have an infected person and a healthy but unvaccinated person in close proximity indoors, the risk of the healthy person getting infected after 20 minutes is 0,1% —

if the mask is worn properly— ie if the metal thing is pushed tightly against the nose.If the mask isn’t worn properly, the risk rises to 4%. If you wear thenormal surgical mask(the blue ones — at least here they’re blue), the risk is 10%. And if you wearno mask at all, the risk is 90% after only a few minutes, even at 3 meters distance. The researchers stress, however, that the estimates are very conservative.In normal life situations, the risk is “surely” 10 to 100 times less.

It’s worth noting that while our original question was about why a 4% risk is not as low as it seems, the point of the article is that even a poorly worn mask is better than none, and a well-fitted mask is far more effective than the researchers expected.

Thinking about your answer about adding up risks (I know: not literally adding up), it makes perfect sense. If I drive drunk once, I might be OK but if I drive drunk often — bad idea. I just hadn’t really thought about it. We talk a lot about the fact that

normal people are so unaware of statistics(I don’t think it was taught in school at my time) and how important it is. I will keep trying to inform myself!

Apart from the mathematical details, what we’ve been discussing about repeated risks should be common sense; but it is easy to overlook!

I replied:

Thanks. A quick search turned up this article that appears to be an English language version of the same report:

https://www.mpg.de/17916867/coronavirus-masks-risk-protection

I work in a tutoring center at my school, which requires everyone to be masked, and we sit (mostly) on opposite sides of wide tables. But many people wear surgical type masks very loosely, and I often wonder what good it is to require masks at all. But the article does say, “Surgical masks already reduce the amount significantly, even if they fit poorly.”

Maybe in addition to “risks add up”, we should add, “

any reduction can help.”

Here are some quotes from the article:

The Göttingen study confirms that FFP2 or KN95 masks are particularly effective in filtering infectious particles from the air breathed – especially if they areas tightly sealed as possible at the face. Ifboththe infected and the non-infected person wear well-fitting FFP2 masks, the maximum risk of infection after 20 minutes is hardly more thanone per thousand, even at the shortest distance. If their masksfit poorly, the probability of infection increases to aboutfour percent. If both wearwell-fitting medical masks, the virus is likely to be transmitted within 20 minutes with a maximum probability often percent.

Here is our graph for all four cases (using their conservative numbers):

As before, the horizontal axis is the number of 20-minute interactions, and the vertical is the probability of at least one infection. A good fit makes a big difference.

The infection probabilities determined by the Max Planck team indicate the upper limit of the risk in each case. “In daily life, the actual probability of infection is certainly10 to 100 times smaller,” says Eberhard Bodenschatz. This is because the air that flows out of the mask at the edges is diluted, soyou don’t get all the unfiltered breathing air. But we assumed this because we can’t measure for all situations how much breathing air from one mask wearer reaches another person, and because we wanted to calculate the risk as conservatively as possible,” Bodenschatz explains. “Under these conditions, if even the largest theoretical risk is small, then you’re on the very safe side under real conditions.”

Here is our graph for all four cases assuming the probabilities are 1/10 of their worst case:

Even here, without a mask, a year of weekly interactions lead to almost certain infection; a well-fitting KN95 gives impressive protection.

Although the detailed analysis by the Max Planck researchers in Göttingen shows that tight-fitting FFP2 masks provide 75 times better protection compared to well-fitting surgical masks and that the way a mask is worn makes a huge difference; even medical masks significantly reduce the risk of infection compared to a situation without any mouth-nose protection at all.

Laura responded:

Cool. And thanks for the English link. I can think of a few people who might be interested in it. I know plenty of teachers, for example. (Although: maybe they would be annoyed — hmmm… have to think about that one)

The fact that masks worn tightly is so efficient is good news to me. That means that in highly uncertain situations, I can wear one and feel quite safe. And the fact that loosely worn masks help somewhat is good news too. It means I can feel fairly safe then too. Like just running into the drugstore quickly.

The COVID era is teaching lots of us about math and statistics. Exponential growth is a good one. For most of us that was just a theoretical idea until now. Suddenly it is really important to understand.

Anyway — I’m glad I discovered this site. I’m pretty sure my husband knew you don’t add up things literally but we were in an argument so he probably fudged a bit to try to convince me. Ah, life!

Until my next statistical emergency!

Laura

There is certainly far more that could be said in practical terms, given more data; but our goal here is just to think about how short-term risks become long-term risks, not to decide policies or judge the accuracy of a study. As I said two years ago, our goal is to illustrate math, not to analyze reality.

]]>A couple recent questions involved related subtleties in probability and combinatorics. Both were about apparent conflicts between similar problems involving cards and dice.

This very detailed question came from Alexander in early July:

Hello!

I have a question regarding the probability of getting one pair in poker and Yahtzee. More precisely, I am having trouble with the phenomenon called double-counting, when the same event is counted more than once. I know how to calculate the

probability of getting a one-pair in poker. However, when I try to calculate theprobability of getting a one-pair in Yahtzee, I get an incorrect answer which I believe has to do with my confusion with double counting. Therefore I will first calculate the probability of getting a one-pair in poker, then do the same thing for Yahtzee and hopefully, you can observe where I went wrong.

Before I show the rest of the question, I want to mention that he is not concerned with how one actually gets the cards in poker or the numbers in Yahtzee, which in both cases can involve more than one step and complicate the probabilities. It will just be assumed that in poker, 5 distinct cards are chosen randomly from a deck of 52, and in Yahtzee, 5 six-sided dice are rolled once.

In poker, one pair means a hand containing two cards with the same number (here, 5), all other cards having different numbers:

To my knowledge, “one pair” plays no role in Yahtzee, but would mean exactly two of one number (5 again), with all other dice having different numbers:

As for double-counting (or overcounting), we’ve mentioned it, for example, in Permutations and Combinations: Undercounts and Overcounts, Arranging Letters with Duplicates, and Interpreting and Solving a Counting Problem. It means that we have counted the same outcome more than once, and need to either compensate, or find a way to avoid it. The key is to recognize whether you’ve done it!

Now, Alexander carefully showed his thinking about each problem:

Poker (An ordinary deck of cards with 5 cards in a hand):

(C(13,1) x C(4,2) x C(12,1) x C(11,1) x C(10,1) x 4^3)/3!

First, I take

1 of the 13 valuesin which I have a pair, andtheir suitcan be combined in C(4,2) ways. Then I choosethe third, fourth and fifth cards, all of which can be combined with 4 suits, thus 4^3.To account for double-counting, I divide by 3!. For example, if my third card is six of spades, my fourth is seven of spades and my fifth is eight of spades, that is the same hand as if my third card was eight of spades, my fourth card was seven of spades and my fifth card was six of spades. These three cards can be arranged in 3! ways, and since order does not matter,I divide by 3! to remove duplicates. Finally I divide with the total number of hands which is C(52,5) and getapproximately 42%. Now on to Yahtzee.Yahtzee (5 6-sided dice):

I want to use the same method I did when I calculated the poker-version.

(C(6,1) x C(5,2) x C(5,1) x C(4,1) x C(3,1))/3!

I choose

one valueout of six, choosetwo dicethen pick thelast three dice. Then, the same way I did in poker, Idividewith 3!, since a composition of 5-5-1-2-3 is the same as 5-5-3-2-1. Then I divide with the total number of dice-rolls which is 6^5. However,this answer is wrong. To correct my mistake, one would have to remove the 3!, but how then do you account for double-counting? If you could help me understandwhy I should ignore double counting in Yahtzee, but not in Poker, I would be grateful.Best regards,

Alexander

He demonstrated the overcounting with cards; is there any with dice?

I answered:

Hi, Alexander.

Nice question!

I’ll start out by saying that my first impression was that

cards and dice behave very differently(because there’s only one of eachcard, whilediceare independent), so I didn’t think the work would be anything like the same. And I think the ultimate answer is going to be that you are being fooled by asuperficial similaritybetween them. Since subtlety is a primary characteristic of combinatorics, that makes this a really nice question to dig into!

**Cards** are discrete objects, so that there is only one of each; this makes permutations and combinations (which involve selecting and/or arranging **distinct items**) appropriate. Once you have a four of diamonds, you know the next card can’t be the four of diamonds (though it could be *another* four). On the other hand, they can be thought of either as **ordered** (the order in which you get them) or **not** (just the set in your hand), so either permutations or combinations can be used.

**Dice** don’t mind having the same *number* in different places; so there is no permutation involved. Their values are **independent**. On the other hand, each *die* is distinct; we often emphasize this by imagining each die being a different color, or tossing each die in a separate place. So in some sense there is an inherent **order** to them.

But in combinatorics, each tool might be used in any problem, as we’ll see. The two problems do turn out to involve similar-looking work after all.

So first, let’s think about exactly what you are doing with the cards:

(C(13,1) × C(4,2) × C(12,1) × C(11,1) × C(10,1) × 4^3)/3!

Restating your explanation, you are doing this:

Choose which

numberto have a pair of: C(13,1) waysChoose which two

the pair will be: C(4,2) wayssuitsChoose distinct

numbersfor the three other cards, in order: 12×11×10 waysChoose the

suitsof these three cards: 4^3 waysAccount for the fact that

order doesn’t count, by dividing by 3!This counts

of cards, not taking order into account; so you divide by the number of ways to choose an unordered subset of 5 cards, C(52,5).subsets

Most of the combinations in his work don’t really need to be written that way; we could simply say that there are 13 numbers to choose from, then \({4\choose2}=6\) pairs of suits, and then \(12\times11\times10=1320\) ways to choose three numbers from the 12 remaining, and 4 choices for each of those suits. Then he divides by the number of ways to arrange the same three cards, since everything else is combinations.

He finds the probability as the number of **sets** of five cards that contain a single pair, over the total number of **sets** of five cards; probability can be calculated as combinations over combinations, which works best here, or as permutations over permutations.

I had one optional suggestion, letting combinations eliminate the overcount:

The one difference in my own way of thinking is that (12×11×10)/3! can also be thought of as choosing a subset of 3 of the remaining 12 card numbers, namely C(12,3). So the calculation can be alternatively written as

C(13,1) × C(4,2) × C(12,3) × 4^3 = 1,098,240

Dividing this by C(52,5) = 2,598,960, we get 0.4226.

This is all correct; I even checked the answer before moving on, by looking it up here:

https://en.wikipedia.org/wiki/Poker_probability#Frequency_of_5-card_poker_hands

Here is what that page says:

Their calculation is identical to ours, not just the number.

Now, **why** does order not count?

Mostly because none of the choices we made involved *locations* of the cards; we just chose which cards to *include*. It would, in fact, be possible to do the work taking order into account. We might do it this way:

- Choose which
**number**to have a pair of: 13 ways - Choose which
**two cards**will have that number: C(5,2) ways - Choose a
**suit**for each of them, in order: P(4,2) ways - Assign
**numbers**to the three other cards, in order: 12×11×10 ways - Assign
**suits**to these three cards, in order: 4^3 ways - Divide by the number of ways to select 5 cards
**in order**, P(52,5)

This gives $$\frac{13\cdot{{5}\choose{2}}\cdot_{4}\!\text{P}_{2}\cdot12\cdot11\cdot10\cdot4^3}{_{52}\text{P}_{5}}$$ $$=\frac{13\cdot10\cdot4\cdot3\cdot12\cdot11\cdot10\cdot64}{52\cdot51\cdot50\cdot49\cdot48}$$ $$=\frac{131,788,800}{311,875,200}\approx0.422569,$$ just as before. The important thing is consistency: If any calculation takes order into account, then *all* must.

Now, how about the Yahtzee version? (I took a moment to check the rules for Yahtzee, which I probably haven’t played in 50 years, and decided that you are just talking about

rolling 5 dice once, and seeing if there is a single pair, without doing all you might do in a real game.)The same basic strategy does apply, but the details are different. We are no longer counting

; in your denominator, 6^5, ordersubsetstaken into account! That is what you missed.is(Note that it would be very hard to solve a dice problem without taking order into account, because duplicates are allowed, so rearranging would not always change the result.)

This is an important point; students often ask how you know whether order does or does not matter, and often in probability questions, that is up to you to decide. As long as the possibilities you count are equally likely, you can calculate probabilities as a ratio of permutations or of combinations. In a hand of cards, as I showed above, you can either think in terms of **how the cards are dealt** (permutations: order matters) or the **set** of cards you have in your hand (combinations: order doesn’t matter – or you might always sort them by number and suit, ignoring the order in which you got them). With dice, you generally want to think as if each die were **distinguishable** (different colors, say), which means that order (which die has which value) matters. You could conceivably ignore order, but it would be very hard to count, and what you counted would not be equally probable. (As a simple example, when you toss two coins, the only possibilities ignoring order are HH, HT, TT, but HT happens half the time.)

So we choose, with good reason, to think of a roll of the dice not as a **set** of numbers (e.g. {1, 1, 2, 4, 4} – but that would actually be a *multiset*), but an **ordered list** of numbers (e.g. 2, 1, 4, 4, 1). On the other hand, this does not make it a permutation, because duplicates are allowed. There are \(6^5=7776\) of these, not \(_6\text{P}_5=720\).

So here is what you are actually doing:

Choose which

numberto have a pair of: C(6,1) waysChoose which two

dicethe pair will be: C(5,2) waysChoose distinct numbers for the three other dice,

i: 5×4×3 ways (that is, P(5,3)n orderWe don’t need to divide by 3!, because this time order

count. Our result isdoesC(6,1) × C(5,2) × P(5,3) = 3600

Dividing this by 6^5 = 7776, we get 0.4630 (46.3%).

Note that where we chose two **suits** last time, we are choosing two **dice** to have the chosen numbers: superficially similar calculations, but entirely different in meaning.

Alexander had calculated $$\frac{{6\choose1}\cdot{5\choose2}\cdot{5\choose1}\cdot{4\choose1}\cdot{3\choose1}}{3!}=\frac{6\cdot10\cdot5\cdot4\cdot3}{6}=600,$$ which amounts to $${6\choose1}\cdot{5\choose2}\cdot{5\choose3},$$ whereas we really have $${{6}\choose{1}}\cdot{{5}\choose{2}}\cdot _{5}\!\text{P}_{3}=6\cdot10\cdot60=3600.$$

When I work a problem like this, I like to first ask,

, and then,What am I countingHow can I count them?In this case, because we have just done a very different problem that was misleadingly similar, the first question was the key.

Alexander replied,

Thank you!

I have searched everywhere to find an explanation like yours, but I have not found one until now. Again, thank you!

This week, while I was editing this post, we got a question that deals with a very similar problem. I normally let a question sit for a month before publishing it (to make sure the discussion is finished, and in some cases to make sure an assignment will be past due before showing the answer to the world), but this belongs here.

Bijan wrote, first quoting from a book (*Probability and Statistics with Applications: A Problem Solving Text*, by Leonard Asimow and Mark Maxwell) how to calculate the probability of a full house (3 of one denomination, and 2 of another) in poker:

Then he explained that he was trying to find the probabilities for the **“poker test” for random numbers**, which treats each sequence of 5 digits produced by a random number generator as a “poker hand” and compares the experimental probabilities of various “hands” to those expected for truly random numbers. He included a link that says this:

Then he showed his work for each case, starting with the **full house**:

We’ve 10,000 random numbers of five digit each. They’re assumed to be independent.

My calculations-:1) Full house

10C1*9C1/10,000 = 0.009

I’m correct. My only confusion here would be the denominator. Why is it 10,000?

According to the above example, should not it be 10C5?

Explanation of my thought process-:

First pick 1 digit out of 10 digits. Then next, pick another digit(only 1 digit as we need a pair), out of remaining 9 digits.

I’ll quote more cases later. I answered this part:

Hi, Bijan.

It happens that the blog post I’m working on for this week is about almost exactly the same issue: the difference between problems about cards and about dice, which amount to random number generators. A central difference is that cards are

, but digits can beunique; so you tend to use permutations and combinations for cards, but not so much for numbers (and for different reasons). Another central difference is thatrepeatedin poker hands, butorder is ignoredin numbers.notYou need to essentially ignore everything you read about poker hands, and just think about numbers.

Even more clearly than dice, random numbers have a definite order of digits; 11234 is not the same as 21314.

Let’s look at your first case, the

full house, to see whether you got the right answer rightly or by accident.First, the denominator should not be 10C5, as you suggest, because you are not choosing 5

digitsdifferentorder, but 5ignoringdigitsunrestrictedorder!countingIt should be 10^5, which is 100,000. I have no idea why you said 10,000.

Later I realized what he had done:

Ah! I just looked at the link you put at the end; I see it says

In

10,000random and independent numbers of five digits each, you may expect the following distribution of various combinationsThat is not the number of

possiblenumbers; they just chose to suppose that youhavethat many numbers in your sample, for the sake of the table following. You forgot to stop and think about where that number would have come from, and whether it made sense as you interpreted it.

In the table, each probability is multiplied by 10,000, the number of random numbers they suppose we are testing. The correct denominator is the number of possible 5-digit numbers, which is \(10^5=100,000\), counting all numbers from 00000 to 99999.

But that’s not the only error. Continuing,

As for the numerator, I would simply say we choose one of 10 digits to occur twice, and one of the remaining 9 digits to occur three times. But then we have to place them in some order, because

. So we choose 2 of the 5 places to put the first digit we chose, which we can do in 5C2 = (5*4)/(2*1) = 10 ways. So the probability of a full house is (10*9*10)/100,000 = 0.009.order does countYour answer just happened to be correct. The method was wrong.

The error of neglecting order was corrected by the error of using 10,000 as the denominator!

I would write it as $$P(\text{full house})=\frac{_{10}\text{P}_2\cdot{5\choose2}}{10^5}=\frac{10\cdot9\cdot10}{100,000}=0.009=0.9\%.$$

He made the same error for **one pair** and for **three of a kind**, thinking he was right because of an error hidden by the wrong denominator. That compensation failed in his work for **four of a kind**:

4) Four of a kind:

So from 10 digits, I need to pick 1 digit and out of the remaining 9 digits, I need to pick another 1 digit.

So, it should be 10C1*9C1/10,000 = 0.009

But it becomes similar to full house. This is wrong. I don’t get why this became wrong.

I didn’t directly answer this, but he has again omitted the choice of a location for the four, which can be done in \({5\choose4}=5\) ways. The correct answer is $$P(\text{four of a kind})=\frac{{10\choose1}\cdot{9\choose1}\cdot{5\choose4}}{10^5}=\frac{10\cdot9\cdot5}{100,000}=0.0045=0.45\%.$$

For the last two cases he did, “**all different**” and **five of a kind**, the only error turned out to be the denominator. Here is the former:

5) 5 different digits:

This should’ve been simple, I got the answer but I got the answer greater than 1.

10C1*9C1*8C1*7C1*6C1/10,000

=3.024

I’m not sure why I got this. I am skeptical about the denominator since the start as I feel that’s randomly chosen here unlike above where we did 52C5. If I increase 1 “zero” in denominator, the answer would be correct.

To this part, I answered:

This is a case where I would use

, but not in the same way as for cards. The numerator will be the number of ways to form a number consisting of 5 different digits, which means a permutation of 5 of the ten possible digits. That ispermutations10P5 = 10*9*8*7*6 = 10!/5! = 30,240.

The denominator, again, is the number of ways to choose 5 (non-distinct) digits, which is just 10^5 = 100,000, since there are 10 independent ways to choose each. So the probability is

30,240/100,000 = 0.3024.

Your numerator is correct, and your answer would have been correct if you had used the correct denominator.

That is, $$P(\text{all different})=\frac{_{10}\text{P}_5}{10^5}=\frac{10\cdot9\cdot8\cdot7\cdot6}{100,000}=0.3024=30.24\%.$$

I concluded,

Ultimately, as I said at the start, you need to simply ignore what is done with cards, and think only about how random numbers work. The comparison of the “poker test” with poker is very misleading! So now go through all your cases, including those where you got the right answer, and correct them. Then we can have another look.

The corrections will be straightforward.

]]>

We’ll start with three different perspectives on the same problem, \(\sqrt{i}\). First, from 2003:

The Square Root of i What is thesquare root of i?

This question was actually asked in 1997, but an answer to an identical question from Doctor Schwa was appended to the page, which makes a good intro:

Good question! When I was in high school and confronted with this same problem,it seemed obvious to me that the answer must be "j". Just like we needed to invent a new number "i" to be the square root of -1, it seemed like we'd need yet another new kind of number to be the square root of "i", and so on forever.

Since we had to “invent” a square root of \(i\), shouldn’t we have to keep doing that? Amazingly, no!

The amazing thing is that you don't: once you have "i", any equations with addition, multiplication, exponents and so on (in short, any polynomial), can be solvedwithout inventing any new types of numbers!An equation like x+3 = 2 makes youinvent negatives, x*3 = 2 makes youinvent fractions, x*x = 2 makes youinvent irrationals, x*x = -2 makes youinvent imaginaries... butthen you're done!

This is what make complex numbers central to much of modern mathematics. They are the ultimate kind of number (in this sense … there are actually more, but they’re very different).

On to the question itself: Just as the square root of 4 is a number whose square is 4, here we want a (complex) number whose square is *i*.

So, once I knew it was possible, and that the answer had to be some complex number (a+bi), the question is,how do you find out values of a and b that will make (a+bi)^2 = i?Well, squaring out the left side gives a^2 + 2ab i - b^2 = i, and the only way for that to work is if the real number part is 0,a^2 - b^2 = 0, and the imaginary part is 1*i,2ab = 1.

So because we are assuming *a* and *b* are real numbers, our single equation \((a+bi)^2=i\), which is equivalent to \((a^2-b^2)+(2ab)i=i\), becomes a system of two (non-linear) equations: $$a^2-b^2=0\\2ab=1$$ This is not hard to solve. (We’ll see harder examples later.)

Since a^2 = b^2, a = b or -b ... but since 2ab = 1, a and b must be both positive or both negative, so a = b. Then since 2ab = 1, and a = b, 2aa = 1, so a^2 = 1/2, and a = b = sqrt(1/2) or a = b = -sqrt(1/2)! Does that make sense?

That gives us the answer: $$a+bi=\frac{\sqrt{2}}{2}+\frac{\sqrt{2}}{2}i\text{ or }-\frac{\sqrt{2}}{2}-\frac{\sqrt{2}}{2}i.$$

As usual, there are two square roots. More on that later!

For another presentation of this approach, starting with the definition of complex numbers as ordered pairs, see

Square Root of i

For a different approach to the algebra, see

An Algebraic Derivation of the Square Root of i

But algebra isn’t the only way, or the most powerful. Here is Doctor Anthony’s brief answer to the 1997 question:

We have i = cos(pi/2) + i.sin(pi/2) sqrt(i) = (cos(pi/2) + i.sin(pi/2))^(1/2) By DeMoivre's theorem: = cos(pi/4) + i.sin(pi/4) 1 + i = ---------- sqrt(2)

Since the argument (angle) of *i* is \(\frac{\pi}{2}\), or 90°, and the square root is the \(\frac{1}{2}\) power, he just plugged these into DeMoivre’s formula, $$(\cos(x)+i\sin(x))^n= \cos(nx)+i\sin(nx),$$ to get $$\left(\cos\left(\frac{\pi}{2}\right)+i\sin\left(\frac{\pi}{2}\right)\right)^\frac{1}{2}= \cos\left(\frac{\pi}{4}\right)+i\sin\left(\frac{\pi}{4}\right)=\frac{1}{\sqrt{2}}+\frac{1}{\sqrt{2}}i,$$ which is the same result as before:

So taking the square root of a number takes the square root of the modulus (length), and halves the argument (angle).

But .. what about the other root? For some reason he omitted that. (We’ll see him get this right in a later answer!)

This is where roots get tricky using this method. Recall from trigonometry that any angle has “coterminal angles” that start and end in the same place, but go around different numbers of times. Our 90° angle could also be called 450° (by adding 360°), or -270° (by subtracting 360°), and so on. When we divide those by 2, we get 225° or -135°, respectively. Either of those gives the same second root:

In general, since we can add any multiple of \(2\pi\) to an angle, when we divide it by 2, we are adding any multiple of \(\pi\) to our square root, which gives us roots that are 180° apart, as here:

Note that if we square either of these, we get *i*, by doubling the angle.

Here is yet another version of the same question, from 1999:

Square Root of i In my Honors Algebra II/Trigonometry class, we just completed a section on complex numbers. One of my students asked me the following: The square root of -1 is i, butwhat is the square root of i?Can you help?

Doctor Rick answered, starting with a warning:

There aretwocomplex numbers which, when squared, equal i. (The same holds true for any number. But for real numbers, we arbitrarily say that the positive root is THE square root. When the roots have imaginary parts, any such choice would be even more arbitrary, and we do not bother to choose one.)

This is an interesting point: Not only are there two roots, but we can’t call one of them the principal root, in the same way we do in saying that \(\sqrt{9}=3\text{, not}-3\). One reason is that neither root is positive, which applies only to real numbers, so we can’t use the rule we are used to; another is that whatever rule we used, it would not always be true that \(\sqrt{w}\cdot\sqrt{z}=\sqrt{wz}\) for all complex numbers *w* and *z*. We could arbitrarily *choose* to call one of them the principal root (and do, for some purposes), but that doesn’t make everything work as expected, as it does for real numbers! This will be the topic of another post someday.

In particular, when textbooks say to write \(\sqrt{-1}=i\), it is just for convenience, and doesn’t mean that is really *the* root.

The square roots of i are (1 + i) * sqrt(2)/2 and -(1 + i) * sqrt(2)/2 You canprove thisjust by squaring each number. Tofind themin the first place, you can useEuler's formulae^(i*t) = Cos(t)+i*Sin(t) Ift = pi/2, you get e^(i*pi/2) = cos(pi/2)+i*sin(pi/2) = 0 + i*1 = i The square root of this is (e^(i*pi/2))^(1/2) = e^(i*pi/4) and using Euler's formula again, we have e^(i*pi/4) = 1/2 + i*1/2

But we aren’t finished, are we?

If you sett = 5*pi/2, you again get e^(i*5pi/2) = i. Taking the square root of this and using Euler's formula, you getthe other root of i.

This can be easy to forget, whatever method you use. You need to keep in mind that there will be two square roots (and three cube roots, four fourth roots, and so on). Or, just use the general form: $$\left(e^{i\left(\frac{\pi}{2}+2k\pi\right)}\right)^\frac{1}{2}=e^{i\left(\frac{\pi}{4}+k\pi\right)}.$$

The angles \(\frac{\pi}{2}\) and \(\frac{5\pi}{2}\) are the 90° and 450° we saw before.

Euler's formula makes it easy to find powers and roots by working in polar coordinates in the complex plane. Any number x + iy can be written in terms of a radius r and angle theta (counterclockwise from the x axis): x + iy = re^(i*theta) r = sqrt(x^2 + y^2) theta = arctan(y/x) Then, using Euler's formula, (x+iy)^k = r^k e^(i*k*theta) = r^k(cos(k*theta) + i*sin(k*theta))

This amounts to the proof of DeMoivre we saw last time. But don’t forget to adjust \(\theta\) to be in the right quadrant.

How about the square root of a complex number other than *i*? Here is a (poorly titled!) question from 2002:

Finding the Square Root of a Quadratic Function I was recently assigned the problem Find thesquare root of 3+4iI am in advanced college math and am totally stumped by this question. Please help me...

Doctor Paul answered, using the algebraic approach:

If sqrt(3+4*i) = a + b*i where a and b are real numbers, then squaring both sides gives: (a + b*i) * (a + b*i) = 3 + 4*i a^2 + 2*a*b*i - b^2 = 3 + 4*i equating coefficients gives: a^2 - b^2 = 3 and 2*a*b = 4

These are the same equations we saw before, but with different constants on the right, which make us do a little more work this time:

Solve for b in the second equation and substitute into the first equation: b = 2/a so a^2 - (2/a)^2 = 3 a^2 - 4/a^2 = 3 multiply both sides by a^2: a^4 - 3*a^2 - 4 = 0

The substitution left us with a variable on the bottom, so we cleared that and got what is sometimes called a biquadratic equation.

This is quadratic in a^2 so we make a substitution: Now let x = a^2 So we have: x^2 - 3*x - 4 = 0 (x-4)*(x+1) = 0 Sox = 4 or x = -1This gives a = 2, -2, i, -i We said above thata had to be real, so it must be the case thata = 2 or a = -2. Now, if a = 2 then b = 1 and if a = -2 then b = -1 So we have: sqrt(3 + 4*i) = (2 + i) or -(2 + i)

Each of the two solutions for *x* gives us two solutions for *a*, and each **real** value of *a* yields a corresponding value for *b*. Here are our roots:

We could also do this using Euler or DeMoivre.

Enough of square roots; let’s take it up a notch, with this question from 1997 on cube roots:

Cube Roots of Numbers If x^3 = N, where N is some expression (which could be a constant), then you have a degree three equation sothere must be three roots. If you take i (sqrt(-1)), then the cube root is -i. But since x^3 = i is degree three, there should be three different values of x. What are they?How do you determine these three valuesfor other numbers? Is there a formula? Please help.

Presumably Jared found the one answer just by recognizing that \((-i)^3=(-i)(-i)(-i)=(-i)(-1)=i\). He recognizes that the cubic equation should have three solutions (though sometimes there can be fewer due to multiplicity). We’ve seen pairs of square roots; how can we find triples of cube roots?

Doctor Anthony answered, focusing first on the general idea of getting three roots, rather than the specific example of \(\sqrt[3]{i}\):

You are quite right that there will be 3 cube roots of a number. You do need to work with complex numbers, however, to understand how to find the three roots, so if what I show you is not clear, it will become so when you have studied complex numbers. Suppose z^3 = 8 Now taking the cube root of each side you would say that z = 2, however, there are two other cube roots which we shall now find. Since cos(2k.pi) = 1 and sin(2k.pi) = 0 wherek is any integer, we could write the equation z^3 = 8(cos(2k.pi) + i.sin(2k.pi))

The key here is to take into account *all representations of the angle*, not just the principal argument. The real number 8 has angle 0, but also any multiple of 360° more or less than that – that is, any integer multiple of \(2\pi\). So we write 8 not just as $$8\left(\cos(0) + i\sin(0)\right)$$ but as $$8\left(\cos(2k\pi) + i\sin(2k\pi)\right).$$

Take cube root of both sides, and use deMoivre's theorem which shows that: [cos(x) + i.sin(x)]^(1/3) = cos(x/3) + i.sin(x/3) to get z = 2[cos(2k.pi/3) + i.sin(2k.pi/3)] k = 0, 1, 2 k=0 gives z1 = 2(cos(0) + i.sin(0)] = 2 (the one real root) k=1 gives z2 = 2(cos(2.pi/3) + i.sin(2.pi/3)) = 2(-1/2 + i.sqrt(3)/2) k=2 gives z3 = 2(cos(4.pi/3) + i.sin(4.pi/3)) = 2(-1/2 - i.sqrt(3)/2) If we give k more values, 3, 4, 5, ..... we simply repeat the three roots already found.

If we didn’t know to expect three roots, we would just keep going until we found that we were repeating. So, rather than explicitly memorizing how many roots there are, we just have to keep the periodicity of the trig functions in mind.

If you represent the three roots on an Argand diagram that has real values along the x axis and imaginary values on the y axis,the three roots will appear as the three spokes of a wheel, with the z values lying on a circle of radius 2 units. One root will lie along the positive x axis, and the other two at +120 degrees and -120 degrees to the x axis. So the roots are symmetrically spaced round the circle. In fact this is always the way that cube roots of a real number will look. If you take thecube root of an imaginary number, say i, then you still get three spokes but they will berotatedround to lie along the 30 degree, 150 degree and 270 degree lines on the unit circle.

Here are the three cube roots of 8: $$z_1=2\left(\cos(0)+i\sin(0)\right)=2\\z_2=2\left(\cos\left(\frac{2\pi}{3}\right)+i\sin\left(\frac{2\pi}{3}\right)\right)=-1+i\sqrt{3}\\z_3=2\left(\cos\left(\frac{4\pi}{3}\right)+i\sin\left(\frac{4\pi}{3}\right)\right)=-1-i\sqrt{3}$$

We could instead have solved this algebraically: $$z^3-8=0\\(z-2)(z^2+2z+4)=0\\z_1=2,z=\frac{-2\pm\sqrt{-12}}{2}\\z_2=-1+i\sqrt{3},z_3=-1-i\sqrt{3}$$

How about the cube roots of *i*? Here they are, by the same sort of work:

In this case, the angle for *i* is $$\frac{\pi}{2}+2k\pi=\frac{4k+1}{2}\pi=\frac{\pi}{2},\frac{5\pi}{2},\frac{9\pi}{2},\dots.$$ so the angle for its cube root is 1/3 of that, namely $$\frac{4k+1}{6}\pi=\frac{\pi}{6},\frac{5\pi}{6},\frac{9\pi}{6}=\frac{3\pi}{2}.$$

So $$z_1=\cos\left(\frac{\pi}{6}\right)+i\sin\left(\frac{\pi}{6}\right)=\frac{\sqrt{3}}{2}+\frac{1}{2}i\\z_2=\cos\left(\frac{5\pi}{6}\right)+i\sin\left(\frac{5\pi}{6}\right)=-\frac{\sqrt{3}}{2}+\frac{1}{2}i\\z_3=\cos\left(\frac{3\pi}{2}\right)+i\sin\left(\frac{3\pi}{2}\right)=-i\\$$

Now, can we solve this one algebraically? As Jared said, we need to solve the equation $$z^3-i=0$$ Knowing that one root is \(-i\), we can factor using the familiar formula for a difference of cubes, $$a^3-b^3=(a-b)(a^2+ab+b^2)$$ and obtain $$z^3-(-i)^3=(z+i)(z^2-iz-1)$$ So all we have to do is to solve $$z^2-iz-1=0$$ using the quadratic formula: $$z=\frac{i\pm\sqrt{(-i)^2-4(1)(-1)}}{2}=\frac{i\pm\sqrt{3}}{2}=\pm\frac{\sqrt{3}}{2}+\frac{1}{2}i$$

I don’t believe I’ve ever done that before!

Now let’s put it all together (a higher root of a complex number), with this question from 2005:

Finding Roots of Complex Numbers How do you find thenth roots of a complex number a + bi?

Doctor Jerry answered with a specific example:

Hello Kira, I'll answer by showing how to find thefifth roots of 2 + 3i. We convert 2 + 3i to polar form and look for complex numbers in the polar form r*e^{i*t} = r*(cos(t) + i*sin(t)). Since 2 + 3i = sqrt(13)*e^{i*arctan(3/2)}, we want ( r*e^{i*t} )^5 = sqrt(13)*e^{i*arctan(3/2)} This is r^5*e^{i*5t} = sqrt(13)*e^{i*arctan(3/2)}

This time the numbers don’t work out quite as nicely as for our previous examples, so this is more typical. But notice that he doesn’t just take the 1/5 power. Instead, he solves an equation, by **setting moduli and arguments equal**, much as we have elsewhere set **real and imaginary parts** equal. (The same thing would work if you used DeMoivre but were only allowed to use integers, because the general case had not been proved yet.)

We see that r^5 = sqrt(13) and so r = [13^(1/2)]^(1/5)] or 13^(1/10) and also that 5t - arctan(3/2) = 2*pi*n, n = 0,1,2,3,4 So, t = [2*pi*n + arctan(3/2)]/5, n = 0,1,2,3,4.

He finds the general solution to the trigonometric equation, and takes the first 5 solutions.

Let's look at one of these "fifth roots." Take n = 3. r = 13^{1/10} = 1.29239222078083 t = [2*pi*3 + arctan(3/2)]/5 = 3.96646992895723 r*e^{i*t} = -0.87707824256 - i*0.949216207595 The fifth power of this complex number is 2 + 3*i.

Here are our roots, with the \(n=0\) and \(n=3\) cases marked out:

Now look at that last picture. There are two parts to finding these roots. We can find the first root by just dividing the argument of the radicand by 5:

Then we can find the others by multiplying this by the five “fifth roots of unity” (“unity” being a name for “1”), which are 1 and four others:

We can talk similarly about the *n*th roots of unity for any positive integer *n*; they form an *n*-gon inscribed in the unit circle.