May was a particularly good month for interesting questions! Here is one requiring us to find one value of a function, based on an unusual property: If \(a+b=2^x\), then \(f(a)+f(b)=x^2\). The problem turned out to be not as hard as it looked, yet the function itself is quite interesting when we step back to look at the whole thing. We’ll see a couple valuable strategies for solving unfamiliar, non-routine problems along the way.

The question came from Manas in late May:

I cannot make use of the given data:

This posed a problem for me, as we do not help students cheat on tests. Too often, students just attach a PDF of a test and, in effect, say “Do this test for me so I can get the credit”. (We politely refuse, and in more egregious cases may try to notify the instructor.) This was not quite like that, so I gave Manas the benefit of the doubt, giving just the most basic hint along with a warning:

Hi, Manas.

First, this is clearly a test for which you want to get credit, so I can’t come close to giving you the answer without cheating for you. (We sometimes report attempts at cheating.)

All I will say is the very first thing I thought of when I saw the problem:

What is the next power of 2 greater than 2014?I was able to quickly solve it starting from there.

Clearly the problem is intended to elicit some creative thinking, as it is far from routine. So my goal was to get Manas to think for himself, with a minimal nudge.

Manas replied:

Yes it was of test, I posted it after the test.

I found that next power of 2 is 2048 and 2014 can be written as 2048 – (2⁵ + 2¹).

But that was all I could do.

I also tried to write 2^x + x² = 2014 (by guess work) don’t know if it’s the correct way.

(I can also share a proof that I posted it after the test by attaching a screenshot of paper timings.)

Manas has gone a little beyond my question, showing good initiative; but has perhaps gone a little too fast. Trying even random things can sometimes be fruitful, especially when you have no good ideas! But it would be more helpful to directly relate what we are seeing to the form of the statements in the problem; since it talks about \(a+b=2^x\), we want to see something like \(2014+34 = 2^{11}\) (the sum of 2014 and another number equaling a power of 2). Writing 34 as \(2^5+2^1\) (effectively, in binary) seems potentially useful, breaking it down into a series of powers of 2, but it’s probably better to take smaller steps. The form \(2^x+x^2=2014\) is probably irrelevant, since we want \(f(a)+f(b)=x^2\), which relates the *value* of the function, not its *inputs*, to a square.

I could have asked for the offered evidence of ethical behavior, but by this time is was more than three hours after the initial question, so I could be sure the 180 minute test was not still current, regardless. (And now it’s over two months later, so I hope it’s safe to be publishing an answer.) So I continued with hints, showing the relevant form:

Nothing in the problem is about the

sum of two powers of 2, or about thesum of a power of 2 and a second power, so I don’t think your ideas lead in a useful direction (though tryinganythingcan be of use, because we don’t always see all the implications!).You found that 2014 = 2048 – 34, so 2014 + 34 = 2^11. What does that tell you about f(2014)?

Now, how might you find f(34)?

My strategy at this point is just to do whatever I can do, taking one step at a time without being sure what comes next; but in doing so, I observe that I’ve reduced the original problem to a similar problem with a smaller number. That seems like progress. Maybe the problem will be recursive?

Manas wasn’t yet ready to proceed:

Thank you for helping and replying till now.

Even with this hint I am still not able to crack it.

It’s much more helpful when you show some actual attempt, even if it went nowhere, than just to say that whatever you tried went nowhere – it’s *what you do*, more than *whether it worked*, that shows us how how we can best help.

I gave a more direct hint:

Can you show me what you tried using my hint?

Here is what I said:

You found that 2014 = 2048 – 34, so 2014 + 34 = 2^11. What does that tell you about f(2014)?

Do you see that 2014 + 34 = 2^11 implies that f(2014) + f(34) = 11^2 ? So if you can find f(34), you can find f(2014).

Repeat the same thinking for the number 34, hoping to get f(34) in terms of smaller numbers.

Another direction you could think is to try letting a and b be various small numbers (e.g. 1 and 1, 1 and 2, etc.) to see if you can find the function value for some small numbers. (If they weren’t required to be positive, I would try zeros.)

We’ll be following the first suggestion; but first, let’s look at what I had in mind in that last paragraph.

Often when I see a functional equation, I start by plugging in small numbers to see what will happen. Some possibilities here (pairs of positive integers whose sum is a power of 2) are

$$a+b=2^x\;\Rightarrow\; f(a)+f(b)=x^2$$

$$1+1=2=2^1\;\Rightarrow\; f(1)+f(1)=1^2=1\;\Rightarrow\; 2f(1)=1\;\Rightarrow\; f(1) = \frac{1}{2}$$

$$2+2=4=2^2\;\Rightarrow\; f(2)+f(2)=2^2=4\;\Rightarrow\; 2f(2)=4\;\Rightarrow\; f(2) = \frac{4}{2} = 2$$

$$1+3=4=2^2\;\Rightarrow\; f(1)+f(3)=2^2=4\;\Rightarrow\; f(3) = 4-f(1) = 4-\frac{1}{2}=\frac{7}{2}$$

This doesn’t seem to be leading to a nice pattern we can generalize, so this “bottom-up” strategy by itself doesn’t look like it will give us a quick answer. But we’ll get back to this, after spending some time with the other strategy!

Manas made some progress by starting with 2014:

Yes I tried it out later on till I reached till f(2).

Now closest power of 2 for f(2) would be 1.

So a = 0 and b = 2.

However I could not get a definite value which would help me to solve f(2014)+ f(34).

The last equation was f(0) + f(2) = 2¹.

This is excellent until the point where he is stuck. Let’s work through the details so far:

We are told that $$\text{if } a+b=2^x\text{, then }f(a)+f(b)=x^2$$ We want to find \(f(2014)\), so we let \(a=2014\) and \(b = 34\), and conclude that $$\text{since } 2014+34=2048=2^{11}\text{, then }f(2014)+f(34)=11^2$$ Therefore \(f(2014) = 121-f(34)\).

So now we want to find \(f(34)\). We find that the next power of 2 greater than 34 is \(64=2^6\).

Now we repeat: Let \(a=34\) and \(b = 30\), and conclude that $$\text{since } 34+30=64=2^6\text{, then }f(34)+f(30)=6^2$$ Therefore \(f(34) = 36-f(30)\).

That didn’t take us much farther, but we’ve crossed over another power of 2. To find \(f(30)\), we see that the power of 2 greater than 30 is \(32=2^5\).

Now we repeat again: Let \(a=30\) and \(b = 2\), and conclude that $$\text{since } 30+2=32=2^5\text{, then }f(30)+f(2)=5^2$$ Therefore \(f(30) = 25-f(2)\).

But 2 is itself a power of 2; and we aren’t allowed to take \(b=0\). So we’re stuck. We need a new strategy.

(Actually, since I showed the bottom-up strategy above, you may notice we’ve already discovered what to do!)

I gave another hint:

The technique does have to change a little to find f(2). (Note that you can’t use 0, as I mentioned.)

What does f(2) + f(2) equal?

(This is where playing with small numbers might have been helpful, because you might have discovered this before getting into the routine of working from large to small numbers.)

Having played with small numbers myself, I knew what to suggest this time! But notice that the change needed here was merely to avoid zero by popping back up to the next higher power of 2, making the sum 4.

That was enough. Manas answered,

Thank you for your help I was able to crack it and the answer which came was 9.

I solved using the last hint you gave which helped me to find f(2) which came equal to 2.

I used this value to find f(30) thus finally solving for f(2014) which came equal to 108 .

Once again thank you sir.

As we found above, \(f(2) = 2\). So we can put everything together:

$$f(2014) = 121-f(34)\\ f(34) = 36-f(30)\\ f(30) = 25-f(2)\\ f(2) = 2$$ so $$f(30) = 25-f(2) = 25-2=23\\ f(34) = 36-f(30) = 36-23 = 13\\ f(2014) = 121-f(34) = 121-13 = 108$$

We were asked for $$\frac{f(2014)}{12} = \frac{108}{12} = 9$$

I acknowledged the correct answer, and suggested a further step:

Perfect!

Fun problem, isn’t it? Now, I wonder if we could find f(x) for

allpositive integers?

The last step in problem solving, as I express it, is to **look back** (checking your answer, and looking for alternative methods), **look around** (seeing if the problem suggests any interesting extensions), and **look ahead** (thinking about how you might use what you learned for future problems). That’s what I’m doing here.

Manas gave the kind of response I hoped for, showing interest beyond answering the one problem:

Yes we can find f(x) for all positive integers by the same mechanism.

We only require values of principle functions (here) f(1), f(3), f(5) and f(7) if required.

Cause rest all numbers eventually break down to ‘something’ + f(1)/f(3)/f(5).

If we have to find f(3), then it can be written as

3+1=2²

f(3)+f(1)=2²

f(1)=1/2

Thus we can find f(3) and for all positive integers.

I’m not sure of the significance of 1, 3, 5, 7; but it is true that we will always come down to some special case in a small number, as we saw above in “playing” with them. But what’s he’s done is to suggest an existence proof, that the function is in fact defined for all positive integers.

Not wanting to get too deep into this question unless Manas chose to, I concluded:

Yes, I think it could be done; perhaps it would even be possible to make a formula for it.

Of course, we don’t need to do it; you’ve solved the problem as stated.

But let’s go ahead and look into it.

We found two cases, depending on whether *a* is a power of 2 or not.

If it is, then \(a = 2^n\) and $$a+b=2^x\;\Rightarrow\; f(a)+f(b)=x^2\\ 2^n+2^n=2^{n+1}\;\Rightarrow\; f(2^n)+f(2^n)=(n+1)^2\\ f(2^n)=(n+1)^2/2$$

Otherwise, let \(n= \lceil \log_2(a)\rceil\), where the special symbols represent the ceiling function (rounding up), and we have a recursive formula, $$a+b=2^n\;\Rightarrow\; f(a)+f(b)=n^2\\ f(a)=n^2-f(2^n-a)$$ where \(0\lt b=2^n-a\lt a\), because \(2^{n-1}\lt a \lt 2^n\). This recursion will always terminate, so we know we can evaluate the function for any positive integer.

Putting this together, we have a piecewise function: $$\text{Let }n=\lceil\log_2(a)\rceil$$

$$f(a)=\left\{\begin{matrix} \frac{(n+1)^2}{2} & \text{if }a=2^n \\ n^2-f(2^n-a) & \text{otherwise} \end{matrix}\right.$$

I made a spreadsheet to evaluate this; here are the first 32 rows:

The graph (for *a* through 2048, here) looks very interesting:

The red line is \(f(x) = \frac{1}{2}\left(\log_2(x)+1\right)^2\), which passes through the points for powers of 2. (And, yes, the spreadsheet shows that \(f(2014) = 108\).) In fact, it looks rather like a fractal; here we zoom in to the part of the graph through 512, the first quarter of the graph above:

I don’t think there will be a nice closed formula for *f*. But it may turn out to be expressible in terms of the binary digits of the input. Maybe a reader would like to pursue that.

We received a couple different questions recently about solving differential equations by separation of variables, and why the method is valid. We’ll start with a direct question about it, and then look at an attempt at an alternate perspective using differentials.

Here is the first question, from teacher Vaneeta, with the title “Differential Equations – separation of variables”:

Dear Dr Math,

It makes sense why I am able to do this but my students have never seen me use this technique before. Is there a way I can explain

whywe can separate the variables andintegrate w.r.t. a different variable on either side of the equation?Thank you for your help.

Let’s start with an example of this technique, so we can see where Vaneeta’s question comes from. I’ll use one from a site we have often referred students to for lessons: Paul’s Online Notes. Here is the first example from that page:

Solve the following differential equation and determine the interval of validity for the solution.

$$\frac{dy}{dx} = 6y^2x\text{ given that } y(1) = \frac{1}{25}$$

We first move all the *y*‘s to the left and *x*‘s to the right, treating the differentials *dx* and *dy* as if they were variables: $$y^{-2}dy=6xdx$$

We integrate both sides to get an implicit solution:$$\int y^{-2}dy=\int 6xdx\\ -y^{-1}=3x^2+C$$

Before solving for *y*, we can find the constant by using the given value \(y(1)=\frac{1}{25}\): $$-\left(\frac{1}{25}\right)^{-1}=3(1)^2+C\\ \\-25=3+C\\ \\C=-28$$

Now we plug in the value of *C* and solve for *y*: $$-y^{-1}=3x^2-28\\ y^{-1}=28-3x^2\\ y=\frac{1}{28-3x^2}$$

The question is, why can we **move differentials around** so freely, and why can we **integrate with respect to two different variables** and expect everything to work out?

Doctor Fenton answered:

Hi Vaneeta,

That’s a very good question, which

books don’t usually explain, but rather give the procedural steps.To be separable, a differential equation must be in a form such as dy/dx = f(x)/g(y), where the right side can be

separated into a product of two factors, each depending upon a single variable. The procedure starts with separating the variablesg(y) dy = f(x) dx

and then integrates he left side with respect to y, and the right side with respect to x. As you observe, this doesn’t seem to make sense.

If you read Paul’s lesson, you will have seen that he *does* start with a brief explanation along the lines of what we’ll be seeing here; but it deserves additional emphasis, as students are likely to skip past it, even when it is presented, because they want to get to the procedure. The procedure he described here is just what we did in our example.

In order to see the underlying reasoning, we have to slow down and write things a little differently:

However,

the variable y is conceptually dependent upon x, so we can write y(x) instead of just y, and avoid treating the derivative dy/dx as a “fraction” (so we can “multiply by dx”). Then we can write the DE asg(y) dy/dx = f(x) ,

in which both sides are explicitly functions of x. Now we can integrate both sides with respect to x:

∫ g(y(x)) dy/dx dx = ∫ f(x) dx .

This could also be written without anything looking like a fraction at all, as $$\int g(y(x)) y'(x) dx = \int f(x) dx$$

All of this makes sense. Now, on the the left side, we can make a

substitutionu = y(x), for which du = dy/dx dx, and the left side becomes∫ g(u) du .

But this is an indefinite integral, so if we can find an antiderivative G(u) for g(u), we have

G(u) = ∫ f(x) dx .

If F(x) is an antiderivative for f(x), then

G(u) = F(x) + C .

We discussed substitution in an integral in Integration by Substitution. There we see why treating \(du\) and \(dx\) in an integral as entities in themselves (differentials) makes sense.

Don’t forget that *u* and *x* are two different, but related, variables; we don’t want to make the mistake discussed two weeks ago in Two Integration Puzzlers!

But u = y(x), so this is really an equation with the

independent variable x,G(y(x)) = F(x) + C,

and this is often written

G(y) = F(x) + C ,

with the explicit dependence of y on x suppressed, so that

y is treated as the dependent variableagain.

In our example above, \(G(y) = -y^{-1}\) and \(F(x) = 3x^2\). This was our implicit equation for *y*.

The usual descriptionomits this use of the substitution process to evaluate the integral on the left side, and just treats the derivative dy/dx as a fraction (which we have usually pointed out many times that it is not!) and “multiplies by dx” givingg(y) dy = f(x) dx

(instead of g(y) dy/dx = f(x) ) and just uses y as a new variable instead of using a new letter u and making the explicit substitution. You still wind up with the same integration to do,

∫ g(u)du or ∫ g(y)dy ,

but

the formal approach is quickerby omitting these steps, and gives the same result:G(y) = F(x) + C

(which is often an implicit function defining y in terms of x).

This is all it takes to support the method, which is just a shortcut for the substitution. And the fact that we can do this is part of the justification for using the notation \(\frac{dy}{dx}\), as discussed in Why Do People Treat dy/dx as a Fraction? and What Do dx and dy Mean?.

Back in April, we had received a very similar question from Kalyan, who related the process directly to the definitions of the derivative and of the (definite) integral, by way of limits:

Hello Doctor,

How is separable variable justified?dy/dt = t/y

How do we cross multiply here?

lim Δt→0 (Δy/Δt) = t/y

If we remove limit from there,

Δy + εΔt = (t/y)Δt

yΔy + yεΔt = tΔt

How is yεΔt removed and

yΔy = tΔt

As Δt→0 , Δy→0

so, ΣyΔy = ΣtΔt

therefore, ∫ ydy = ∫ tdt

Is this how I should think about this?

Areas of both sides is equal, how?

Kalyan is using a simple example (with variables *t* and *y* rather than *x* and *y*), rather than looking for a general proof; that’s a good idea when you are trying to understand a concept.

What he sees is that we can’t really just cross-multiply to get \(\int y dy = \int t dt\), because the derivative isn’t really a fraction. Rather, it is a *limit* of a fraction. So, how can you justify what appears to be cross-multiplying? And then, how can you integrate (which can be thought of as finding an area) with these limits and deltas in the way?

In terms of the limit defining \(\frac{dy}{dt}\), the differential equation can be thought of as saying that \(\frac{\Delta y}{\Delta t}\) differs from \(\frac{t}{y}\) by a quantity \(\epsilon\) that approaches zero as *t* approaches zero, so that $$\frac{\Delta y}{\Delta t}+\epsilon = \frac{t}{y}\\ \Delta y+\epsilon\Delta t = \frac{t}{y}\Delta t\\ y\Delta y+y\epsilon\Delta t = t\Delta t$$ By this approach, he hopes to show that \(y\epsilon\Delta t\) approaches zero. Unfortunately, the integrals will be hard to fit into a proof, as it is the **definite** integral, not the **indefinite** integral, that is a limit of a summation, and the summations themselves are troublesome, as we’ll see. We won’t be completing this proof, because the other is so much cleaner. But it does give us a little more to discuss.

Doctor Fenton gave a similar answer to the later one above, focusing on the better way to explain the method, rather than Kalyan’s idea:

The way I think about this is the following. The equation

dy/dt = t/y

assumes that

y is a differentiable function of t, y(t), and the derivative of this function is t/y(t).Then, multiplying by y,

y(t) dy/dt = t ,

so the two functions on the two sides are the same. Then integrating gives

∫ y(t) dy/dt dt = ∫ t dt .

If we let u = y(t) be a new variable in the integral on the left side, by the

Substitution Theoremthe left side becomes∫ u du = ∫ t dt ,

so integrating gives u

^{2}/2 = t^{2}/2 + C . But since u = y(t), this is y(t)^{2}/2 = t^{2}/2 + C.Since we already know that y is a function of t, we can just write y

^{2}/2 = t^{2}/2 + C.

This is the same procedure we saw before. There is really no cross-multiplication to justify, and no integration with respect to different variables.

But we could also just look at the following as a

formal process. Instead of explicitly introducing the new variable u, just think of y as a new variable and write the integrandsy dy = t dt ,

and integrate to y

^{2}/2 = t^{2}/2 + C .You can just work formally (i.e. just manipulate symbols), and avoid explicitly invoking the Substitution Theorem.

To finish, we can now solve for *y* by multiplying by 2 and taking the square root: $$\frac{y^2}{2}=\frac{t^2}{2}+C\\ y^2 = t^2 + 2C\\ y=\pm\sqrt{t^2+2C}$$ Given a particular value \(y(t)\), we would determine the appropriate sign and constant. For example, if \(y(0) = -3\), we would find that \(C=\frac{9}{2}\) and the sign must be negative, so that the solution is \(y=-\sqrt{t^2+9}\). To check, $$y’=-\frac{t}{\sqrt{t^2+9}}=\frac{t}{y}$$

Kalyan’s concern was with his own approach, and whether it could be made to work. He asked again, adding some details:

Hello Doctor,

Is my thought using

[lim Δt→0 (Δy/Δt)] = t/y

Δy + εΔt = (t/y)Δt

yΔy + yεΔt = tΔt

as

ε decreases at a faster rate than Δtso I can remove that from the equation, what remains isyΔy = tΔt

If I do a summation of the terms in the equation on both sides of it with appropriate limit of approaching infinity,

does that not reduce to an integral where I can take the upper limit as x?Wrong?

Again, notice that the particular summation is not specified, and involves two *different* sums, not to mention that summation would yield a definite integral, not the indefinite integral he shows.

Kalyan then added information about the background of his thinking:

I got this above idea from removing the limits from an expression.

For example –

[f(x+Δx) – f(x)] = f

^{‘}(x)Δx + εΔxas I have removed the limit the expression would have a small error called epsilon. so,

∑ f(x+Δx) – f(x)] = ∑[f

^{‘}(x)Δx + εΔx]As, ε is quite small, and Δx is also quite small, provided I would not take the limit to make it approach to a value that we call a limiting value. Will ∑εΔx approach zero? That is my question.

We’ll take a look at this below, though it is not directly related to our topic. This example deals with the **Fundamental Theorem of Calculus (FTC)**, and involves *definite* integrals, with the same summation on both sides, so that it doesn’t run up against the difficulty Doctor Fenton is emphasizing.

Doctor Fenton reiterated what he had said, emphasizing the key issue that Kalyan had not dealt with, namely that the summations he is introducing on each side are different (which was Vaneeta’s main concern as well):

Remember that the original post was about justification for the separation of variables for differential equations. I gave you a justification for that, using the substitution theorem.

You are approaching from a differential viewpoint using difference quotient approximations to the derivative. You have a formula in y on one side and a formula in t on the other (actually you have Δy+εΔt on the left side, and tΔt on the right). You are trying to justify ignoring the εΔt on the left without taking a limit. But the problem is that you still have a formula in y on the left side and one in t on the other.

How do you justify integrating with respect to y on the left side while integrating with respect to t on the other?What I pointed out is that the DE dy/dt = f(t)/g(y) means that y = y(t) is a function of t. If you write

g(y(t)) dy(t)/dt = f(t),

then both sides are functions of t, and you can integrate both sides with respect to t. If G'(y)=g(y), then d/dt (G(y(t)) = g(t)dy/dt

∫ g(y(t)) y'(t) dt

can be evaluating it as ∫ g(y) dy, either by using the Chain Rule (saying that if y = y(t), then d/dt(g(y)) = dg/dy * dy/dt) or the Substitution Rule.

That’s why separation of variables actually works.Because of the Substitution, an integral with respect to t is equivalent to an integral with respect to y if the integrand is of the correct form: if G'(y) = g(y), then d/dt [G(y)] = g(y) dy/dt, and the integral∫ g(y)dy/dt dt = G(y(t)).

It’s conceivable that one could express all this in terms of the definitions, by incorporating the proofs of the substitution rule and other facts used here; but this is why we prove theorems from theorems! Basing everything directly on the basics can get impossibly complicated, and is unnecessary.

Subsequently, Kalyan started a new thread to ask about the Fundamental Theorem of Calculus proof he had based his ideas on:

f(x+Δx) – f(x) = f ‘(x)Δx + εΔx

As I have removed the limit, the expression would have a small error called epsilon.

So, ∑ [f(x+Δx) – f(x)] = ∑ [f ‘(x)Δx + εΔx] .

As ε is quite small, and Δx is also quite small, provided I would not take the limit to make it approach to a value that we call a limiting value. Will ∑εΔx approach zero? That is my question.

Technically, the summations need to be specified: For what values of *x* are these terms being summed? But since in this case they are the same on both sides (and we are doing a definite integral, so it really *is* a summation), Kalyan’s specific question about the deltas is the main obstacle, and we can start with that.

Doctor Rick responded:

Hi Kalyan,

Doctor Fenton has already explained how and why separation of variables works mathematically. You have asked for input about your specific approach, so I will make some

relatively informal commentsabout the idea.You are going back to the idea of

integration as the limit of a summation. Without being rigorous, let’s consider how it will work for a typical well-behaved function — one that has all derivatives in the domain over which you are to integrate. Thenthe function f(x) can be expanded as a power seriesin Δx:f(x+Δx) = f(x) + f ‘(x) Δx + f ”(x) (Δx)

^{2}+ …Now we can write

f(x+Δx) – f(x) = f ‘(x)Δx + (f ”(x) + f ”'(x) Δx + …)(Δx)

^{2}

This power series provides a more clearly defined way to talk about an error that “decreases faster” than something else, as we’ll see.

Compare this to your

f(x+Δx) – f(x) = f ‘(x)Δx + εΔx

and you will see that your ε is some quantity (depending on Δx, but presumably finite) multiplied by Δx:

ε = (f ”(x) + f ”'(x) Δx + …)Δx

Let’s call the quantity in parentheses δ; it depends on x and Δx, but we can expect it to be bounded. Then

εΔx = δ(Δx). As Δx is decreased toward 0, (Δx)^{2}^{2}decreases faster.That’s what you were trying to say earlier when you wrote, “ε decreases at a faster rate than Δt.” It isn’t that

ε itselfapproaches 0 more rapidly than Δx, but it does approach 0, andthe term εΔx(the difference between Δy and dy/dx Δx) approaches 0more rapidlythan Δx. That’s what we need to know.

So far, we have clarified the basis of Kalyan’s informal argument, and his use of epsilon. To deal with the integral, we need to be more specific about what the summations mean:

Now, let’s examine the

summationyou ask about:∑ [f(x+Δx) – f(x)] = ∑ [f'(x)Δx + εΔx]

We understand this as shorthand for a summation that, when the limit is taken, will turn into a definite integral (from x = a to x = b). That summation is over “slices” of equal width Δx. On the left we have

(f(a+Δx) – f(a)) + (f(a+2Δx) – f(a + Δx)) + … + (f(b) – f(b – Δx))

which is a

telescoping seriesequal to f(b) – f(a). On the right, the first term of the sum becomes, in the limit, ∫_{a}^{b}f ‘(x) dx.

If you are not familiar with telescoping series, you will find some examples in the post Summing Squares: Finding or Proving a Formula.

The Fundamental Theorem of Calculus, which this work leads to, says that $$\int_a^b f ‘(x) dx = f(b) – f(a)$$

What about the second term, ∑ εΔx? That is what you are asking about. The ε, as I mentioned above, is proportional to Δx: ε = δΔx. Thus the sum is

∑ εΔx = ∑ δ(Δx)

^{2}= (Δx)^{2}∑ δThe sum ∑ δ, being over n = (b – a)/Δx terms, varies as 1/Δx, so that the entire sum ∑ εΔx varies as Δx and hence goes to 0 in the limit as Δx goes to 0.

I think this answers your question.

So his basic idea of using epsilon has merit, though his attempt at justifying separation of variables failed to deal with the basic issue of integrating with respect to two different variables.

]]>One of our first posts, in 2018, was about zeros in long division. But we still get many questions about this issue, and it’s time to dig in deeper. We’ll look here at two of them, answering the twin questions, “When do you put a zero in the quotient … and when do you not?”

The first is from a student who called himself CodedGamer, in late May:

Recently when I was solving some exercise problems in my book, I came across a situation where I had to divide 245/8, well at first this looked quite simple and usual to me but then when I arrived at the answer and cross checked it, it seemed like the answer was wrong. So I once again checked my answer and still got

3.625and then being confused when I did it on my calculator I got the answer as30.625.The problem is I don’t understand

why you add a zero before .625. Usually the cases for adding a zero before a decimal place is different from the one in this question.The way I went about solving the problem:

8 | 245 | 3.625 (-)24050 (-)4820 (-)1640 (-)400I also checked on this article https://www.themathdoctors.org/long-division-when-zero-gets-in-the-way/ but did not get any help from it.

All I need is a precise explanation on why we add a 0 before .625 which seems unusual to me.

The error here is the same type discussed in the old post; but it is easy for many students to miss, especially if they learned the method by rote and picked up a wrong idea of when and why you need to write a zero. The reality is that “before a decimal” makes no difference; so we can guess that Gamer is tripping over other things that happen there.

Doctor Fenton answered, starting with alternative methods for doing the same division:

Dr. Peterson’s explanation is worth reading more than once, if you don’t understand it the first time, but I will try a couple of other approaches to approach the problem.

One is that 245 divided by 8 can be written as a fraction, 245/8, and we can write

245=240+5=240+5= 30 + 5/8 = 30 + 0.625 = 30.625 8 8 8 8

Sometimes the easiest way to do a division is to treat it as a fraction! In this case, it makes the zero hard to miss.

Another way to see this is to look at a similar problem:

__31_____ 8) 255.000 (-)24↓ 15 (-)87 :The problem occurs at this stage. After multiplying 3 times the 8 in the quotient to get 24, which is then subtracted from the first two digits, 25, you have a remainder of 1, and you bring down the next digit, 5, so that you have then have 15 as the new (partial) dividend. 8 goes into 15 1 time, with a remainder of 7, and so on.

The new quotient, 1, goes into the 1’s placein the overall quotient for the problem.

Note that he is using the style of long division in which the quotient is placed above the dividend, rather than to the right as Gamer did it. (Wikipedia indicates that this style is used in the English-speaking world; I have never determined where Gamer’s style is standard, but we see it often. I’ll call the former “American style”.) As we’ll see later, this can help keep track of digits better. But the key is that the new digit goes into the same place as the digit “brought down”.

Compare this with your problem:

__3?_____ 8) 245.000 (-)24↓ 05When you bring down the 5 (from the 1’s place in the dividend), you need to put the new quotient in the 1’s place in the quotient for the problem. Here, your remainder from the previous step was 0, so when you bring down the 5, the new dividend at this step is 5, which is

smaller than the divisor, 8. You are just ignoring this problem and bringing down another digit, from the tenths place in the decimal to make a new dividend of 50. But that is not the algorithm. The division of 5 by 8 should be thought of as 5 = 0*8 + 5: that is, 5 divided by 8 is 0, with a remainder of 5, sothe quotient at the second stage is 0, and that quotient is to be put into the 1’s place in the quotient for the problem. You must not ignore quotients of 0 in the intermediate steps.

A zero is not nothing! It *must* be written into the quotient like any other digit.

A more extreme example is

__200912 ) 24108 (-)24↓↓↓ 01↓↓ (-)00↓↓ 10↓ (-)00↓ 108 (-)1080You cannot just ignore 0 quotients in the intermediate steps in the algorithm.

Does this help?

Gamer responded:

Thank you for responding and taking my question into consideration.

I clearly understood the step where you say 5 divided by 8 is probably 5 = 0*8 + 5 with zero as the quotient and 5 as the remainder.

But let’s take another problem and compare 245/8 with it:

8 | 236 | 2916(-) 7672(-) 4So let’s pause the problem here, here we are left with the remainder as 4 and the quotient as 29. After this I would go on and

add a decimal place to the quotient. And then I canadd a zero to the remainderwhich would become as 40 after which I can make the quotient as 29.5 and end the problem.My point is in the case of the second example you gave me (24108/12)

we add a zero to the quotient to bring another digitfrom the dividend to the remainder. But in the case of 245/8, whenwe add a decimal to the quotient; doesn’t that mean we can actually bring another digit down (in our scenario, zero) to the remainder?I am still not sure if I understand long division properly because when I approached different sites and people and asked them about this problem, they were able to tell the right answer but not explain how they got it right. So the thing I expect is why isn’t the division of 236/8 not similar to 245/8.

Doctor Fenton replied,

Every time you “bring down” another digit, it comes down

from a decimal position in the dividend: the 1’s place, the 10’s place, the hundred’s place, or the tenth’s place, the hundredth’s place, etc.The number brought down becomes the units digit in the new dividend at that level, and you have to

put the quotient of the new division in the corresponding place in the quotient on top.

When you bring down a digit from the ones place, you have to put a digit in the ones place of the quotient. To put a digit in the ones place, you have to have brought a digit down from the ones place.

In 236/8, you got the integer part of the quotient as 29, but there is a remainder of 4, which is in the same column as the 6 which was brought down to make 76. You can stop at this point and say the answer is 29 remainder 4, or you can keep going to get a decimal answer. If you want to keep going,

the next digit comes from the 0 in the tenth’s placeof 236=236.0. Bringing down that 0 makes the new dividend 40, and the quotient of 40/8 is 5, with no remainder, so youput the 5 in the tenth’s placeof the quotient on top: 29.5. Every time you bring down a number to make a new dividend at that level, you have to put the quotient,even if it is 0, in the decimal position that the number brought down came from.

Gamer had not mentioned bringing zeros down, after the decimal point; that may be part of what he is missing. We are still following the same process.

You write

But in the case of 245/8, when we add a decimal to the quotient; doesn’t that mean we can actually bring another digit down (in our scenario zero) to the remainder?

Every time you bring down a number from the original dividend, you have to put a digitYou can’t skip a position and bring down two digits at one time. In 245/8, when you bring down the 5, it came from the units position in the original dividend, and the new dividend is 05, and 5 = 0*8 + 5 (in the same positionin the quotient.exactly, not probably!) The new quotient at this step is 0, and the remainder is 5, so you have toput a 0 in the units placeof the quotient for the problem. Then you can bring down a zero from the tenth’s place to make a new dividend of 50, but now 50 = 6*8 + 2, so you put a 6 in the tenth’s place. Then you bring down the 0 from the hundredth’s place to make a dividend of 20, and so on.

Gamer had also asked his question as a comment on the post he referred to:

I came across a similar situation in long division where I had trouble with understanding zeros. In the problem 245/8 I went about solving it the usual way and got the answer as 3.625 which was wrong when I checked it with my calculator, the correct answer was 30.625. The thing I would like to know is what is the meaning of the zero before the decimal point.

I answered him there:

Hi, Gamer.

The key idea is that each time you divide by the divisor,

you must put the quotient digit in the answer, even if it is a zero.In your example, you first divide 24 by 8, and write the quotient, 3, in the answer (specifically, in the

tens place, as 24 is in the tens place of the dividend – that is, it means 24 tens, not 24 ones). After multiplying and subtracting, the remainder is 0.You then bring down the next digit, 5, and append it to the remainder to get 05. When you divide this by 8, you get 0 (because 5 < 8), and

you must write this 0 in the answer. It is needed because we need aones placein the answer, corresponding to 5 being in the ones place of the dividend.After that, you can continue. Since the next digit you bring down is a 0 from the

tenths place, the quotient you get, 6, must go in thetenths placeof the answer.So,

what is the meaning of the zero? It can be considered aplace holder, to ensure that the 3 you got represents 3 tens. That is because when you divided 24 by 8, you were really dividing 240 by 8, andthe quotient is 30, not just 3.We can express the work algebraically. To divide 245/8, we treat it as (240 + 5)/8 = 240/8 + 5/8 = 30 + 5/8. This corresponds exactly to what I just described.

Compare a similar problem, 236/8. Here we treat it as (160 + 76)/8 = 20 + 76/8; then we treat 76/8 as (72 + 4)/8 = 72/8 + 4/8 = 9 + 0.5. So the answer is (160 + 72 + 4)/8 = 160/8 + 72/8 + 4/8 = 20 + 9 + 0.5 = 29.5. The difference between the two problems is merely that there is a 0 in the ones place in the first example.

Does that help at all?

At this point, Gamer responded:

Thank you for responding and considering my question.

After reading Doctor Peterson’s blog post twice I pretty much understood the algorithm and the way long division works. Also I would like to thank Doctor Fenton for helping me out on this. I guess now I am quite familiar with the way long division works.

The reason all this looked different to me might be because of the way I learnt long division back in school. As

in school they teach us like “follow this step and you will get the answer”which I feel is quite wrong becauseyou don’t actually see or understand the thing that is happening under the hood. Like I never knew the reason for adding a zero in the quotient to bring down another digit from the dividend until I read Doctor Fenton’s reply.Once again I would like to thank Doctor Fenton and Doctor Peterson for helping me out on this.

Just what we hope for!

A similar question with different particulars had come to us from Mekhael back in March:

I want to ask about 34 739 divided by 3.

My answer is 115 790.6 but calculator says 11 579.6 .

Why is there no zero after nine?We usually put a zero before putting a decimal. I’m stuck. Can you please help.MY WORK:

3 ) 34739 ( 115790.6304317152321292720182

Mekhael makes the exact opposite error to Gamer, adding an **extra** zero, evidently thinking the decimal point requires it.

Doctor Fenton answered:

Hi Mekhael,

I think you are losing track of the decimal point in your computation.

Were you ever taught to use

estimationto get an idea of the answer to a problem? Your computation is telling you that you are dividing 34,739 into three parts, and one of those parts is over 115,790, over three times the original amount. To check the reasonableness of the answer, you can compare your problem to an easier one, say dividing 33,000 by 3. That is clearly 11,000, so your answer should be a bit larger than 11,000, not over 100,000.

We can check a division by estimation, or by multiplication. Both of our “patients” have checked only by using a calculator, rather than by thinking about meaning.

I think I can show you the error with a

simpler example: 37 divided by 3. If you use division withremainders, you write123 ) 373761which you write as 37/3 = 12 R 1 or 12 remainder 1.

But if you want a

decimalanswer, you keep going:12.333 ) 37.003↓ ↓↓ 07 ↓↓6↓↓1 0↓9↓ 109:In the remainder version, the remainder must always be less than the divisor, but if you want to keep going to get the decimal answer,

you don’t put 0 into the quotient. Youwrite the decimal point, and bring down the first digit after the decimal point and continue.

Once again, notice that showing the decimal point and following zeros in the dividend demonstrates that the same process continues.

For your problem, you could start by thinking

?????.??3 ) 34739.00The dividend has five places to the left of the decimal point, so the quotient will have at most five places to the left of the decimal point. If you

line up the quotient above the dividend, you can put the decimal point in the quotient positionbefore you even start the computation. After you have done the division where you brought down the 1’s digit (in your computation, that is the line where you divide 29 by 3, putting a 9 in the 1’s digit of the quotient, leaving a difference of 2. That 2 can either be the remainder, or you put a decimal point in the quotient and continue the division bringing down the first digit after the decimal point.Does this help?

As we’ve seen, the American style makes it easier to put digits in the right place.

Mekhael wrote back,

If we compare the above question with another question, which is 21 divided by 2, the answer is supposed to be 10.5.

Why do we put 0 before decimalin this question anddo not put a 0 in the previous questionwhere the answer is 1579.6. Why is it not 15790.6?

Doctor Fenton replied,

Writing

1?.?2 ) 21.02↓ 01the 1 we just brought down is the last digit before the decimal point, and the difference of the first step was 0, so the next step divides 2 into 1. 2 goes into 1 0 times, so we have

10.?2 ) 21.02↓ 01001That 1 is now the remainder, so you can write 21/2 = 10 R 1 (10 remainder 1), or you can keep going to get a decimal answer:

10.?2 ) 21.02↓ ↓ 01 ↓00↓ 1 0and the quotient now

continues after the decimal point. 2 goes into 10 5 times, so10.52 ) 21.02↓ ↓ 01 ↓00↓ 1 01 00and the division terminates.

Last time there was no *reason* for a zero; this time there is. *That* is the difference!

You have been writing successive digits in the quotient as quotients of smaller problems, and

the 0 before the decimal is one of these quotients. You haven’t arbitrarily inserted a 0. If the problem had been 23/2, we would write11.52 ) 23.02↓ ↓ 03 ↓02↓ 1 01 00Do you see the difference? Now there is a 1 in the units place of the quotient, because the 3 in the units place in the dividend gives a digit of 1 in the quotient because 3 is larger than 2. When there was a 1 in the units place of the dividend, the divisor 2 is larger than 1, so

the digit it contributes to the quotient is 0, which we write in the units digit of the quotient. There is nothing arbitrary about this step.

Mekhael tried to summarize:

I just came to know that if we take down a digit from the original number and it’s small then only we add a zero but if we have the remainder that is small then we don’t put a zero, is that correct? Am I getting it right?

Doctor Rick joined in to clarify:

Hi, Mekhael. I am going to add some thoughts.

Here is how I would say what I think you are saying. When you find the remainder, the remainder

be small, in the sense that if it’s greater than the divisor, you picked too small a digit for the quotient. But if,mustafteryou’ve brought down the next digit from the dividend, what you’ve got isstillless than the divisor, then the divisor goes into that number 0 times, and you write this 0 as the next digit of the quotient.

The issue isn’t a small digit brought down, or a small remainder, but a zero quotient.

Here’s your work on the original problem up to the point just before you added the 0 incorrectly:

3 ) 34739 ( 11579304317152321292720Here you had a remainder of 2, which is less than 3 as it should be. Then you brought down a zero (

the unwritten tenths digit of the dividend), making 20 as your new partial dividend. Dividing 20 by 3, you get 6, which you then write as the next digit of the quotient. (You need to remember toput the decimal point in the quotient first. This is easier to remember when you put the quotient above the dividend, as Doctor Fenton has been doing, rather than at the right.)

Again, using the American style and writing the decimal point in the dividend helps you remember.

You would

onlywrite a 0 in the quotient if you say, “3 doesn’t go into my partial dividend.” You don’t ask that question when all you have is the remainder from the previous step; you wait untilafteryou brought down a new digit.This really

has nothing to do with where the decimal point is; I think the real reason you got confused is that youbrought down a 0. Here’s another problem where you need to bring down a 0:3 ) 34039 ( 11346 r 13043109131219181When I brought down the 0, I didn’t write a 0 in the quotient. I saw that 3 goes into 10 3 times, so I put 3 into the quotient.

Again, it is not bringing down a zero, but having a small partial dividend, that results in a zero.

Here is one more example, where you

dowrite a 0 in the quotient:3 ) 317 ( 105 r 2301017152When I brought down the 1, I had a partial dividend of 01, or 1, which is less than 3. Since 3 does not go into 1 (that is, it goes in zero times), I wrote 0 in the quotient.

That’s the only time you should write 0 in the quotient!

Gamer made a different mistake:

Students probably have more trouble

remembering to write the 0thanwriting 0 when they shouldn’t. That’s because we often shorten the work to this:3 ) 317 ( 105 r 23017152Since subtracting 0 from a number doesn’t change the number, we can save work by not writing the subtraction. But we’re still doing it, and we still need to write our quotient digit 0.

I hope this helps!

The more you write, the safer you are; the less you write, the more you have to think!

Mehkael concluded:

]]>Thanks a lot Dr Rick and Dr Fenton. I found your prompt replies very helpful. And they did clear my concept. Thanks again.

Here is the first question, from Amia in mid-May:

Hi Dr Math

Can you tell me what is wrong with my solution?

Thank you

Let’s look closely at what Amia has done. First, she rearranged and made a substitution:

She divided the numerator and denominator by *x*, then defined a new variable \(u=\frac{1}{x}\) and made a substitution, which is not quite complete at this point, because both *x* and *u* are present.

I myself would have gone directly to an integral involving only *u* by observing that \(x=\frac{1}{u}\), so $$\frac{dx}{du} = -\frac{1}{u^2}\\dx=-\frac{1}{u^2}du$$ so we can immediately substitute and get $$\int\frac{\frac{1}{x}}{\frac{1}{x}+x}dx=\int\frac{u}{u+\frac{1}{u}}\cdot-\frac{1}{u^2}du=-\int\frac{1}{u^2+1}du$$

Amia got there with an extra step:

Now she has observed that this is just the negative of the original integral, except with variable *u* instead of *x*, and summarized the work so far in an equation, changing the variable and then solving for the integral:

But clearly the integral can’t be zero!

(In fact, you may recognize the integral as a standard one, giving the inverse tangent. When I showed this discussion to a colleague during a lull in Zoom tutoring, she saw it that way immediately, blinding her to Amia’s work. It can be hard to see a student’s thinking when you are thinking too well yourself! I often consciously hold myself back from solving a problem to force myself to see through the student’s eyes.)

Having done all this thinking and a bit more, I replied:

Hi, Amia.

Thanks for

a beautiful apparent paradox! I stared at it for a while, almost convinced, before suddenly realizing what’s going on here.Let’s look at it:

The circled line looks much like

something we commonly do in integration by parts, when we get back to the original integral.

The basic idea of what Amia does at this step, treating the integral we seek as a variable and solving for it, is not in itself invalid. Here is an example in which this is done in integration by parts: $$\int e^x\sin(x)dx$$

Integrate by parts, taking $$u = \sin(x), dv = e^x dx, du = \cos(x) dx, v = e^x$$ so that $$\int e^x\sin(x)dx = uv-\int v du =\\ e^x\sin(x) – \int e^x\cos(x)dx$$

Do it again, this time taking $$u = \cos(x), dv = e^x dx, du = -\sin(x) dx, v = e^x$$ so that $$\int e^x\cos(x)dx = uv-\int v du =\\ e^x\cos(x) + \int e^x\sin(x)dx$$

Putting these together, $$\int e^x\sin(x)dx = e^x\sin(x) – \left[e^x\cos(x) + \int e^x\sin(x)dx\right] =\\ e^x\left(\sin(x) – \cos(x)\right) – \int e^x\sin(x)dx$$

But the integral on the right is the same as that on the left; we can add it to both sides: $$2\int e^x\sin(x)dx = e^x\left(\sin(x) – \cos(x)\right)$$

Solving, we find that $$\int e^x\sin(x)dx = \frac{1}{2}e^x\left(\sin(x) – \cos(x)\right)$$

Indeed, if we differentiate this, we get $$\frac{d}{dx}\left(\frac{1}{2}e^x\left(\sin(x) – \cos(x)\right)\right) =\\ \frac{1}{2}e^x\left(\sin(x) – \cos(x)\right)+\frac{1}{2}e^x\left(\cos(x) + \sin(x)\right) = e^x\sin(x)$$

Amia’s work is very similar, in that we got back to the same integral and solved for it … but notice what Amia did that was different:

But on the RHS, you have replaced an integral in u with an integral in x, just

changing the name of the variable. This would be appropriateif these were definite integrals, as the variable in that case is just a “dummy variable”. But an indefinite integral is not just a number; it isa function of that variable! And in this context, you can’t just change the name of the variable.

For example, with **definite** integrals, it is true that $$\int_0^\pi e^x\sin(x)dx = \int_0^\pi e^u\sin(u)du$$ because the variable, *x* or *u*, appears only within the integral, rather than being a variable with a value “on the outside”. The value of each integral is a number (specifically, \(\frac{1}{2}(1+e^\pi)\)), which doesn’t depend on the variable used inside.

And in our integration by parts example, like Amia’s problem, we were working with **indefinite** integrals, but there we had the same variable all through the work, so everything was a function of *x*.

But with **indefinite** integrals, **the variable matters**. For example, \(\int 2xdx = x^2\), while \(\int 2udu = u^2\). These can be thought of as the same *function* of different *variables* (say, \(f(x)\) and \(f(u)\), but they are *not* the same *expression*, or the same *value*:

If we write ∫1/(1 + x^2) dx = f(x), where f(x) is a function we are trying to determine, then ∫1/(1 + u^2) du = f(u),

notf(x)! They are not the samequantity, or even the samefunction (of x).So that circled line in your work should really be

f(x) = – f(u) + C

We can’t solve this for f(x), so the rest of your work is wrong.

Because *x* and *u* represent different numbers, we can’t say that \(f(x) = f(u)\), so we can’t change that line to \(f(x) = -f(x)+C\), which is in effect what Amia did.

(Note that I added “\(+ C\)” without comment, because all indefinite integrals implicitly have such an arbitrary constant added on; we’ll be seeing that become important momentarily! If you include “\(+ C\)” in the integration by parts example above, you’ll find that it doesn’t affect the results; we’d just have a different constant.)

But we do know how our *x* and *u* are related: they are reciprocals.

But we

cando something interesting! Make the substitution again:f(x) = – f(1/x) + C

This tells us a property of the function f:

f(1/x) = C – f(x)

And in fact, that property is true of the actual integral, which as you probably know is f(x) = tan

^{-1}(x). One property of this function is thattan

^{-1}(1/x) = π/2 – tan^{-1}(x) (for x > 0)since the LHS is cot

^{-1}(x).

Note that here the constant *C* has become important. This would not work if we required *C* to be zero.

There are some tricky details in my mention of the inverse cotangent, as different sources use different ranges, which lead to different identities, as explained in my post Ranges of Inverse Trig Functions. My restriction to positive *x* avoided these issues.

I am reminded of some similar apparent paradoxes discussed here:

1=0? Calculus Says So [or Not]The usual issue is the omission of the constant of integration (which I quietly added above); here it is much more subtle (but also much more blatant once you see it).

Keep reading, and you’ll see the same issue again. But first, let’s close the loop and find the correct answer, though I’ve already mentioned it several times.

When one method of integration doesn’t work, we try another. In this case, we can do a different substitution, namely letting \(x=\tan(\theta)\). Then \(dx=\sec^2(\theta)d\theta\) and \(1+x^2=1+\tan^2(\theta) = \sec^2(\theta)\), so $$\int\frac{1}{1+x^2}dx=\int\frac{1}{\sec^2(\theta)}\sec^2(\theta)d\theta=\int d\theta = \theta = \tan^{-1}(x) + C$$

Or, we might just have recognized the integrand as the derivative of \(\tan^{-1}(x)\).

The second question, from Tony, came in an hour and a half after the first:

Why does the integral of (1 + x)^2 not equal the integral of (1 + 2x + x^2)?

(1 + x)^3/3 is not equal to x + x^2 + x^3/3

This may not seem important, but when calculating a constant of integration the difference does matter.

Let’s check out the claim:

- The two integrands are the same: $$(1+x)^2 = 1+2x+x^2$$
- The integration is correct in each case: $$\int(1+x)^2dx = \frac{1}{3}(1+x)^3$$ by substituting \(u=(1+x)\); $$\int(1+2x+x^2)dx = x+x^2+\frac{1}{3}x^3$$
- The first integral expands to \(\frac{1}{3}+x+x^2+\frac{1}{3}x^3\), which is not equal to the second. But having done that, you might get a clue to the answer!

Doctor Fenton answered this one, pointing out how close Tony came to answering his own question:

Hi Tony,

You mention determining the

constant of integrationin your question, but you ignore that constant in the calculations in your question.If you include the constant of integration and write the integral of (1 + x)

^{2}as (1 + x)^{3}/3+ Cinstead of (1 + x)_{1}^{3}/3, and the integral of 1 + 2x + x^{2}as x + x^{2 }+ x^{3}/3+ Cinstead of x + x_{2}^{2 }+ x^{3}/3, and if you take C_{1 }= 0 and C_{2 }= 1/3, then you see that x + x^{2 }+ x^{3}/3 + 1/3 = (1 + 3x + 3x^{2 }+ x^{3})/3 = (1 + x)^{3}/3. So if you include the constants of integration with the proper values, the two answers are the same.That is why you should always write the “+ C” when computing indefinite integrals.

To put it a little differently, if Tony had included constants in his two integrals, as $$\int(1+x)^2dx = \frac{1}{3}(1+x)^3+C_1\\ \int(1+2x+x^2)dx = x+x^2+\frac{1}{3}x^3+C_2$$ then setting them equal would yield $$\frac{1}{3}(1+x)^3+C_1 = x+x^2+\frac{1}{3}x^3+C_2\\ \frac{1}{3}+x+x^2+\frac{1}{3}x^3+C_1 = x+x^2+\frac{1}{3}x^3+C_2\\ \frac{1}{3}+C_1 = C_2$$ so that we simply have two different constants of integration.

This is very similar to the resolution of the “paradox” in the previously mentioned post, __1=0? Calculus Says So [or Not]__.

Both questions lead to the same lesson: In calculus (well, really anywhere in math!), you should never do anything without thinking. Renaming variables without considering the context, and blindly writing “+ C” without realizing what it means (or omitting it because you forget what it means) can leave you with confusing or simply wrong results. That’s what makes this an important thing to see!

]]>

We’re going back to early April for this question:

I’m attempting question 10, and I have been able to complete parts a and b, however, I’m stuck on part c.

We’ve not yet covered differentiation of trig functionsso I can’t see why that would be part of the question. I thought at first that I needed to sketch the inverse function but I realised that1/(1+2secθ) is the reciprocal. I can’t figure out where to go from here. Any help is appreciated.

Normally, when you are asked where the gradient (slope) is zero, you find the derivative. Here Frankie is apparently expected to use knowledge of graphs to determine it indirectly.

He has correctly graphed \(y=1+2\sec(\theta)\) by a series of transformations from \(y=\sec(\theta)\), and found the points where the graph is horizontal (so the gradient must be zero). Here is an “official” graph showing both functions (secant in green, stretched and shifted up in red, but not strictly restricted to the specified interval):

Now, part (c) is about the **reciprocal** of this function; a graph is not required, just extrema, based on this graph. At least *imagining* the reciprocal graph, though, will be useful.

Doctor Fenton answered with a simple hint:

When you have a fraction whose numerator and denominator are both positive, then

the fraction gets larger when the denominator gets smallerwhile the numerator stays the same (or when the numerator gets larger while the denominator stays the same, but that is not the case here). For example, 3/4 > 3/5.If the fraction has a negative value and the numerator stays the same, then when the denominator gets smaller,

the fraction has a larger magnitude(absolute value), but a negative number with a larger magnitude is “smaller” in the sense of being further left on the number line: -3 is to the left of -2, so we write -3 < -2; and 3/(-5) > 3/(-4). You have to use this meaning of smaller when finding maxima and minima of functions. Because of these facts,you don’t need calculus. You can tell just using the graph.So you have the graph of y=1+2 sec x on the interval [-π,2π]. At what points will the function y=1/(1+2 sec x) have a (local) maximum? Where will it have a local minimum?

So the reciprocal function is **increasing** where the original is **decreasing**; and that should mean it will have a **maximum** where the original has a **minimum** … but there are some twists involving asymptotes and zeros.

Frankie replied,

Well, I’m not sure where the extrema are located. I was thinking maybe the local maximum could be as x approaches 3pi/2 or pi/2 as 1+2 sec(theta) will get smaller and smaller so the reciprocal will get larger and larger but

I don’t know if this is a possible answer since it is undefined at these valuesso we could not produce any values for them so the local maximum maybe is just at pi. For the local minimum I had the same confusion i.e. that the local minima were at the asymptotes (-pi/2 and pi/2) but again this would mean we couldn’t get a value for 1+sec(theta) and also that pi/2 is related to both the minima and maxima so perhaps it is just at 0 or -pi as these can give a value for 1/(1+2sec(theta)) and have gradients of zero?

There are some good thoughts here, but he’s getting confused trying to think too abstractly.

I joined the discussion hoping to help him gain the hands-on experience needed to support such thinking:

Hi, Frankie.

I want to jump in here to suggest something that may help.

Don’t think too much about the specific question yet;

you may first have to develop your sense of what taking the reciprocal of the function does, so you can see the implications.Instead, just take your graph of 1 + 2 sec(θ) and use it to

sketch a graph of 1/[1 + 2 sec(θ)] point by point, while thinking about what will happen at other nearby points.Start with the

relative minima and maximaon the graph you made. You have a point (0, 3); on the new graph, there will be a point (0, 1/3). Do the same for other similar points.Now think about what will happen to points

neareach of those. Near θ = 0, your graph ishigherthan 3, so the new graph will be a littlelowerthan 1/3, right? So which direction will the graph curve? Make a little curve showing that.How about the

asymptotes? As θ approaches π/2 from the left, the graph you have rises to +infinity. What will happen on your new graph? What happens on the right?Show us the graph you make, and tell us what you found as you made it. Then we can talk about the implications for the question you were asked.

The key idea will be to consider the key points on the original graph, and learn to see how the graph will behave near them on the new graph, using Doctor Fenton’s ideas.

Doctor Fenton first pointed out a possible ambiguity:

You have a correct graph for y = 1 + 2sec x, and it is in four disconnected pieces. Two of them look like a U or an inverted U.

Where are the local maxima and minima of these two pieces?The other two pieces each look like half of a U (or inverted U).The problem just says “extrema” and doesn’t specify whether it means

localextrema orabsoluteextrema. Usually, when we talk about local extreme points, there must be an open interval which contains the extreme point, and that point is larger than all the other points in the open interval (for a local maximum), or smaller that all the other points in the open interval (for a local minimum). An absolute maximum [minimum] is the largest [smallest] value of the function on the entire interval. When you have a graphon a closed interval, as in this case, you need to say whether you can talk about things like derivatives (or gradients) or relative extrema at the endpoint of the interval.

This doesn’t ultimately affect the problem, but it’s a valid concern.

The graph of y = 1 + 2sec x is in four pieces, and

the graph of y(although if you graph it,_{1 }= 1/(1 + 2sec x) will also be in four piecesit will look like a single continuous curve). (I’m calling the second graph y_{1}so we can tell more easily which graph I mean.) It is clear from your graph that when x = 0, y = 3, so y_{1 }= 1/3. If 0 < x < π/2, then y > 3 (so y_{1}< 1/3, while if -π/2 < x < 0, y > 3 (so y_{1}< 1/3) also. That means that y has arelative minimumat x = 0. What does y_{1}have at x = 0?

Here we are thinking about behavior of the new graph at the point corresponding to a relative minimum of the original function. Since the original function is **higher** on either side of \((0,3)\), the new function must be **lower** near \((0,\frac{1}{3})\), making that a **relative maximum**.

Note that y has a

vertical asymptoteat x = -π/2, and that when x < -π/2, then y < 0, while if-π/2 < x < 0, then y > 0. What can you say about lim(x→ -π/2

^{–}) y_{1}(thelimit from the left)? What about thelimit from the right, lim(x→ -π/2^{+}) y_{1}?

Where the original graph **rises toward infinity**, the new graph must **fall toward zero**.

Frankie answered, providing a (mostly) reasonable graph:

I have drawn a sketch of the function. As I went along I noticed all points

below the x-axis(excluding -1) got larger/closer to positive infinity but smaller in magnitude, and converselyabove the x axisall the points got smaller/further from positive infinity, though again all points got smaller in magnitude. This makes sense that all points besides those with a value of 1 would get smaller in magnitude. Also as the valuesapproached the asymptotesof the original graph they allapproached zerobut would never reach since this would mean the function would be undefined, so there would now be anasymptote at y = 0.

I think the part about magnitudes and infinity was an echo of Doctor Fenton’s statement that “a negative number with a larger magnitude is ‘smaller’ in the sense of being further left on the number line”. The reciprocal of a negative number has smaller magnitude than the number itself, and is therefore greater (“closer to positive infinity”) than the original. A better point would have been that as the negative value of the original function *increases* toward the relative maximum of -1, its reciprocal *decreases* toward -1, which will therefore be a minimum.

The point about the new graph approaching zero where the original had an asymptote, is correct. We’ll discuss the wrong claim of an asymptote at \(y=0\).

Unfortunately, where he had previously given correct extrema (though without knowing which were correct), he has been distracted from the goal by my recommendation to graph the function.

Doctor Fenton responded:

You seem to be getting away from the question, which was to

find the maximum and minimum valuesof y = 1/(1 + 2 sec x) on the interval -π ≤ x ≤ 2π.You are also discussing asymptotes, but

there are no vertical asymptotesof y = 1/(1 + 2 sec x) in the interval (there are two points where it is undefined), andyou cannot have a horizontal asymptoteon a finite interval. Your graph is roughly the right shape, and you should be able to determine the maximum and minimum points on the interval on it.

Frankie replied with the correct answer:

For the extrema I can now see where they are,

-1 is the minimum value of the functionand this occurs at -pi and pi since the question asks for the smallest positive value this must be pi. As for themaximumthis occurs at 0 and 2pi.I have a question. If the graph did not have a finite interval would there be an asymptote at y = 0? As surely no value could be 0 due to the equation being 1/(1+2 sec(theta)).

The minima are at \(\left(\pi,-1\right)\) and \(\left(-\pi,-1\right)\); the maxima are at \(\left(0,\frac{2}{3}\right)\) and \(\left(2\pi,\frac{2}{3}\right)\).

Doctor Fenton pointed out the errors in talking about asymptotes:

There are

no vertical asymptotesof y = 1/(1 + 2 sec x), sincethe denominator never approaches 0. I’m not sure what you mean by “an asymptote at y = 0”.

Vertical asymptoteshave equations x = k for some k, and we call it a vertical asymptote at x = k. But y = 0 is ahorizontalline (the x-axis). Even if we consider the graph over the entire real line (there are points where the denominator approaches ∞ or -∞, so the graph is undefined there), this function is periodic: the piece between x = -π and x = π just keeps repeating. Itnever approaches a horizontal line.

I jumped back in to comment on this asymptote issue:

Hi, Frankie.

I suspect you are confusing the idea of an asymptote (which is

a line that the function approachesas you go to infinity) with the related idea ofthe function being undefined. Many points on a function that are undefined are associated withvertical asymptotes, because the function becomes infinite as it approaches there; but sometimes the graph just has a“hole”because it is undefined at one place, but it remains finite nearby. That is what is happening here. (You have probably seen “holes” in the context of rational functions.)

Here is the original graph in red, including its asymptotes, and the reciprocal graph in blue, showing “holes”.

I made a different graph at the time:

The denominator, 1+2 sec(θ), is undefined at these points, so the function 1/(1+2sec(θ)) is likewise

not definedthere; but the function isapproaching 0, not infinity. Here is what the graph really looks like:(I made this on Desmos, but had to manually add the “holes”, because computers tend not to notice such little details! The open circle represents the fact that one point has been removed from the curve, though points right next to it exist.)

This agrees with your graph. You even included the “holes”, by not letting your graph continue through zero. So you have done well.

I added the explanation of how I made this, as a reminder not to trust a computer too much. It uses a rote method to plot points, without intelligence.

Frankie replied,

Yes, I was getting confused, I assume then that the reason why there isn’t asymptotes on a finite interval is that they don’t approach infinity. I see now comparing the Desmos graph that it should have a more convex shape as opposed to its concave shape so I have redrawn it in red.

Anyhow thank you both for your help on this question.

Details of the shape, such as convexity, don’t really matter for this problem.

I answered,

Hi, Frankie.

Right. Horizontal asymptotes are relevant only over an unbounded interval. Of course, there could be a vertical asymptote, but there isn’t in this case.

Doctor Rick now added his own comments to both of ours:

Hi, Frankie. If I may join in also, I think there might still be something to clear up about horizontal asymptotes.

Doctor Fenton said,

You cannot have a horizontal asymptote on a finite interval.

You responded with,

If the graph did not have a finite interval would there be an asymptote at y = 0?

Doctor Fenton did not know what you meant by this, but I assume you were asking if there would be a horizontal asymptote of the function y = 1/(1 + 2 sec x) on its full natural domain.

Doctor Peterson talked about confusions regarding asymptotes, which did apply to horizontal asymptotes:

I suspect you are confusing the idea of an asymptote (which is

a line that the function approachesas you go to infinity) with the related idea ofthe function being undefined.However, he talked mostly about the fact that the function y = 1/(1 + 2 sec x) does not have

verticalasymptotes. You then replied:I assume then that the reason why there isn’t asymptotes on a finite interval is that they don’t approach infinity.

So let me say

a bit more about horizontal asymptotes. First, this function is periodic, anda periodic function cannot have a horizontal asymptote. The value of the function does not approach 0, or any other value, as x increases toward infinity; every value in its range continues to appear as its output in each cycle. For instance, y = 1/3 when x = 0, 2pi, 4pi, 6pi, and so on “forever”; but y = -1 when x = pi, 3pi, 5pi, and so on “forever”. It never “settles down”.

Doctor Fenton had mentioned this very briefly.

Second,

I wonder if you were thinking that a horizontal asymptote is a horizontal line that never intersects the graph of the function. You might say, Since y never equals zero (though it can get as close to 0 as we want), y = 0 must be a horizontal asymptote. But as Doctor Peterson said, that’s not how to think of an asymptote — it’s all about the graph of the function gettingcloser and closerto the asymptote as x increases or decreases (for a horizontal asymptote) or as y increases or decreases (for a vertical asymptote).I may be wrong about what you were thinking, but perhaps this explanation might help you.

The suggestion is that Frankie might be thinking of this as an asymptote, because our function “approaches” it (at certain points) without reaching it:

But a horizontal asymptote involves *infinity*.

Frankie responded:

Yes that does make sense. It seems as though

asymptotes are quite closely related to the idea of limits. In that they both arise as the desired input increases/decreases in magnitude. However where an asymptote is essentially “undefined” as the function continues to approach it forever, a limit is “defined” in that it gives us a value the function would, under certain conditions, reach. So they are kind of the converse of each other?

“Converse” is definitely the wrong word! But they can be thought of as the same phenomenon viewed from different directions.

Doctor Fenton finished the discussion with a fuller definition of each kind of asymptote:

Asymptotes aren’t just “related” to limits, they are DEFINED by limits. A

vertical asymptoteoccurs at x=a whenever at least one of the one-sided limitslim (x→a

^{+}) f(x) = ∞ (or -∞) or lim (x→a^{–}) f(x) = ∞ (or -∞) .That means that the function

becomes unbounded in every neighborhood of a. That has nothing to do with whether f(a) is defined or undefined: it is the behavior of f in neighborhoods (a, a+δ) or (a-δ, a) that determines whether a vertical asymptote exists. (In fact,whether f(a) is defined or not has NO effecton whether a limit as x→a exists.) The vertical asymptote is thevertical linex = a. It is not undefined.

In this case, the asymptote is defined by the value of *x*, which is very much defined!

]]>A

horizontal asymptoteis defined by a limit at +∞ or at -∞. The horizontal line y = L is a horizontal asymptote of f(x) if eitherlim(x→∞) f(x)=L or lim(x→ -∞) = L .

It is also

commonly misunderstoodthat a function graph cannot intersect a horizontal asymptote. That is true for some functions, such as f(x) = 1/x (y = 0 is a horizontal asymptote in both directions) or f(x) = 1/(1+e^{-x}) (y = 1 is a horizontal asymptote as x→∞).But y = 0 is also a horizontal asymptote for f(x) = e

^{-x}sin(x), and f(x)crosses the asymptoteinfinitely many times.An ordinary limit tells you about the behavior of a function as x gets nearer and nearer to a point, but it tells you NOTHING about f(a) itself.

f(a) may exist or not; it does not affect whether f has a limit at a.A horizontal asymptote tells you about the behavior of a function for

ever increasing values of x.

It seems that most of the interesting questions recently have been about relatively advanced topics, though commonly in introductory classes. Here, we’ll help a student think through a problem introducing the idea of a random walk on a graph. (“Graph” here doesn’t mean the graph of an equation, which we study in algebra, but the subject of Graph Theory: an arrangement of nodes and edges, as we have previously discussed in Graph Coloring: Working Through a Proof, and less formally in Frequently Questioned Answers: Three Utilities.) We’ll watch the evolution of a representation of the problem until it becomes easy.

Here is the problem, from the start of May (I have replaced the low-quality image that was provided with my own drawing):

Please find the question below. Help me in solving it.

Perform random walk in the undirected graph of Figure 1:

Suppose we start from node A at Step 0. At each step, we move to one of the neighboring nodes with equal probability. Then for each node we have possibly moved to, we continue to move from it to the next node using the same probability rule.

In the first two steps, the probability of walking to each node is shown in Table 1. Please run and obtain the probabilities for Step 2 to 4. Besides filling in the table, you should also write down the computation processes.

Note: Theoretically, the probabilities of walking to certain nodes will converge to a stationary distribution, if the number of steps is large enough. This question aims to provide you with a sense of such convergence.

At a higher level, we would be asked to prove this convergence, but that is far beyond the level we are at, so I will stick to the question as asked.

Step 1 is easy, as there are two equally-likely destinations, so each of nodes B and C has probability \(\frac{1}{2}\), and no other node is possible:

But what happens next?

Since Chethan had not shown where he needed help, I started as usual, with a request and a hint for starting:

Hi, Chethan.

Since it says, “This question aims to provide you with a sense of such convergence”, it will be essential for you to

work through it yourself to see how it works. They have taken you through the first step; you just need to continue the process.What I need, in order to help you, is some information about

what help you need. You can either show me what you think you need to do for the next step or two so I can see if you understand it, or ask some specific questions to show me where you are unsure.

For each of the nodes you might be at(B and C), find theprobability of going to each other possible node, and write in the probability that you will be there at the next step (multiplying and adding as needed). You’re expected to show your thinking, so tell me how you get the numbers. Then we can discuss it.

The central idea here, hinted at in my suggestion, is that the probability of ending up at a given node is the sum of the probabilities of each path to get there; and the probability of getting there by a given path is the product of the probabilities of each choice made along the way. This is somewhat reminiscent of the “Pascal’s triangle” approach to counting paths discussed in How Many Paths from A to B?

This request yielded the initial work I needed:

I found out Probabilities in Step 2.

I’m not able to do it for Step 3 and Step 4.

Please solve and send the solution if possible.

This is correct as far as it goes, though he hasn’t explained all the details. So let’s examine his work.

We started at either B or C, each with probability \(\frac{1}{2}\). From B, he shows two arrows marked with the probability \(\frac{1}{4}\); this is the probability of being at B \((\frac{1}{2})\), times the probability of taking that path from B \((\frac{1}{2})\); similarly, from C, he shows four arrows marked with the probability \(\frac{1}{8}\); this is the probability of being at C, times the probability of taking that path from C \((\frac{1}{4})\). Then he has added the probabilities on the arrows reaching each node, and put those in the table. (As a check, I would add up the row to make sure they total 1, and they do.)

This is just right, and is just what I had suggested. So we’re doing well … except that he was unable to continue.

Unfortunately, the work didn’t show me where Chethan might be going wrong, so it was less useful than an error would have been! (Many students fail to recognize the value of failure, both as a way to learn and as a way to ask for help.)

I replied, quietly rejecting his request for a full solution:

You got step 2 right. I don’t see why you can’t do the next. Please show your attempt, or tell me what went wrong. I want you to be able to work it out yourself.

Chethan answered,

I’m not able to find Probabilities at Node A and Node B.

Please solve and send the solution if possible.

He has correctly shown what happens if we were at D or E, from which we can get only to C, D, or E; he seems not to have tried handling paths from A, B, or C, for no clear reason. I suspect this method, drawing out every possible transition, is just getting a little too complicated! A better method may be needed. I often encourage students to pursue a more or less brute force method at first, if only to gain a feel for how things work while looking for a way to make it easier. Shortcuts are often best seen after you walk past them. (That reminds me of some times when I’m out geocaching in the woods and realize, after bushwhacking to a goal, that there was a deer trail I could have taken instead …)

I suggested some alternative ways to organize the work:

I imagine your difficulty is in

finding a way to keep track of different ways to get to each node. I see two ways to do it.You appear to be working

backward, listing all the ways to gettoa given node, and adding up the probabilities. I think it can be done this way if you are careful, but you have missed some ways to get to C, D, and E, and apparently didn’t try for A and B.[I now think I was wrong in this guess.]My first thought was to work

forward. I would list each node with its probability from step 2, and then below each I would list the nodes I can get tofromit, each with its new probability. Then I would add them up.

What Chethan had not done is simply to start at **every** possible node, rather than just D and E.

One difficulty in the method he has been using is simply that there will eventually be too many arrows to keep track of. So I suggested a table to replace the graph itself:

For instance, here is how I might have done step 2:

A B C D E 0 1/2 1/2 0 0 A 1/4 1/8 = 3/8 B 1/8 = 1/8 C 1/4 = 1/4 D 1/8 = 1/8 E 1/8 = 1/8For example, there are four places to get to from C, so the probability for each of those paths is 1/4, and I multiplied 1/2 by 1/4 for each.

This gives the results you got, but in a way that can be applied to any state.

I intentionally only redid step 2, for which we already have the answer, to make the difference clear; this is just a different representation of the same work, with a cell in the table replacing an arrow on the diagram. Now that all nodes have been reached, steps from here on will need all 25 possible arrows (or cells), not just the 6 we had here!

Another disadvantage of even this table is that, because the probabilities are new for each step (taking into account the starting probabilities from the previous step), we are missing an opportunity to reuse work from one step to another. We’ll be fixing that!

There are many ways to organize this, some perhaps less space-consuming than this (but harder to type). There are also ways to do this with

matrices, but I am guessing you are not familiar with those.See what you can do for step 3 now.

The use of matrices is part of the more advanced approach we’re not going to get to.

Chethan again was reluctant to keep showing work:

Sorry, I’m unable to find out the Probabilities in Step 3.

Please help me out with this.

Please send the complete solution so that I can understand and get back to you if I still have any doubts.

I chose to apply some “tough love” and demand an effort:

No, I will not let you bypass learning to think for yourself; I won’t give up on your intelligence. You are not unable; you appear to be unwilling to take the risk of being wrong, which is the only way to learn.

You have shown that you understand the basic ideas by getting step 2 right; and I have shown you a complete solution for step 2, which should be enough to get you moving forward. I know you can do what I suggest, either by copying what I did or by inventing some other way of organizing your thinking. A complete solution of the rest will not help you learn the things this exercise is intended to teach you.

Show some more thinking on step 3, and then we can discuss what you are misunderstanding, if anything.

I’m often a little worried when I do this, but it worked. Now I got what I wanted; in fact, he is now devising a new representation, departing from the form in which the graph was given:

This is all I could do.

If it’s wrong, please send the correct solution.

Here we see work from the start all the way to step 3, and a major change in representation. He is now showing each entire path from the start, rather than just working with one step at a time. In particular, he now has a row at each step for all nodes with non-zero probabilities, though those are wrong at step 3, and each arrow is now labeled only with the probability at that step, not the cumulative probability. The latter change is one I consider helpful; the former perhaps overcomplicates things – but in the process, we can better see how he is thinking!

I replied:

Thanks. Now we have something to talk about.

Your answers for step 3 are wrong, but you’ve given me some questions to ask.First,

you have invented one of the alternative representations of the work that I had considered, namely rewriting the network in a way that shows all possible motions. The trouble is, though you nicely show the relevant part for step 2,you didn’t show all of it for step 3.In using that method, I would want to put ABCDE on one line, and another copy of ABCDE on the next line, and then show

all possible paths from each node at one step to any nodes in the next step. It would look like this:Now we can easily see where we can get to from each node, and we can assign a probability to each path. Then to get the probabilities for the next step, we just multiply the probability of being at each node already, times the probability of each path, then add those up to get the new probability for each node.

This new representation, used as I described here, will allow us to flatten the whole process, using the very same diagram repeatedly for each step. If Chethan had drawn two complete rows, he would have seen this. (Of course, this diagram has so many arrows it would be impossible to label each arrow, so we’ll have to move to a table format.)

Now I have to ask for your thinking at some points in your work. In step 3,

you say you can’t get to A or B at all:Can you tell me why you say that? You can always go to A from B or C, and to B from A or C. I suspect this indicates some basic misunderstanding of the meaning of the graph.

I was wondering if he didn’t fully understand that this is an undirected graph, on which you can go both up and down.

Next, for D you added two terms, though your drawing only shows one path:

And the second of these shows four factors, whereas anything at step 3 should have three factors. What does that second term represent? You do something similar for C.

We won’t get to see the reason for this wrong work, which could be just a typo or something deeper.

Now I dealt with the fact that Chethan is looking at each entire path, rather than just the changes from one step to the next:

Now, I wouldn’t go back to the start for each path; instead, I’d

start with the already-calculated probabilities for one stepand just multiply each of those as appropriate for the next step, so I’d have a sum of products of only two fractions for each term. But what you did helped me see better what you are thinking.And in order to make the work easier, I would observe that

each step is obtained from the step before in the same way. For example, if A represents the prior probability of being at A, and so on, then the new probability of being at A would be 1/2 B + 1/4 C. This makes arecursive formulathat I can fill in for each step with very little new thought.I’m glad that you didn’t just take the method I suggested, but instead

thought for yourself. I hope these ideas help you move further. The sort of ideas I’m talking about illustrate how you think through a new problem, and invent notations and techniques for solving it.

The idea of a recursive process, where we do exactly the same calculations at each step, is key to the whole idea of the problem.

Chethan responded,

Are these Probabilities correct?

If yes, please help me with Step 4.

This is exactly what I had in mind; with the work he’s put into trying things on his own, Chethan was well prepared for the new method. A few minutes later (it was the middle of the night for me), he added,

I even tried Step 4.

Is it correct?

It is indeed! I answered,

Both are good. (I first checked by seeing that each row sums to 1; then compared to a spreadsheet I made yesterday.)

I think you’ve got it.

If you write the probabilities as decimals, you can see the limiting distribution a little more clearly, especially by row 5 or 6.

Chethan replied,

Thanks for your help.

I couldn’t have solved without your support.

I mentioned a spreadsheet; this implemented the recursion, making each row from the one before like this:

Step | A | B | C | D | E |

0 | 1 | 0 | 0 | 0 | 0 |

1 | =C2/2+D2/4 | =B2/2+D2/4 | =B2/2+C2/2+E2/2+F2/2 | =D2/4+F2/2 | =D2/4+E2/2 |

The result for the first four steps (displayed as fractions) looks like

A 3D bar chart shows how the probabilities change from step to step:

At the back we see the starting row, with P(A) = 1; in the front we see the fourth step, with C about twice as likely as each of the others. What happens if we continue for 10 steps?

This makes it clear that the respective probabilities are approaching \(\frac{1}{6}\), \(\frac{1}{6}\), \(\frac{1}{3}\), \(\frac{1}{6}\), \(\frac{1}{6}\), which do, of course, add up to 1. In a later course, or a later part of the same course, this may be proved.

]]>Two weeks ago, in Proving Certain Polynomials Form a Group, we joined a beginner in learning about groups. Here we will pick up where that left off, learning how to prove that the group we saw there, a subset of polynomials, is isomorphic to a group of matrices. As we did there, we will stumble through some misunderstandings that are probably more common than teachers of these topics may realize – one of the benefits of watching conversations like this.

First, as we saw before, a **group** is a set with an operation that satisfies certain requirements: The set must be **closed** under the operation, which must be **associative**, and the set must include an **identity** element and an **inverse** for each element.

Here we will be seeing if two groups are **isomorphic**, which essentially means that they behave alike – they have the same underlying structure. Specifically, there must be an **isomorphism**: a map between the sets that is **one-to-one** and **onto** (injective and surjective), so that each element of one corresponds to exactly one element of the other, and it must **preserve the operation**. We’ll be seeing more about that.

The question came to us in early April, soon after the other:

Here is my assignment:

I understand that

for two groups to be isomorphic they must be 1-1 and onto.My understanding is that you must prove that they are of the same order:

- Square matrix and second degree polynomial are the
same order. Not sure if anything else needs to be explained.- Need to prove that these groups are
1-1.I know that

they are individually 1-1because we found the inverse in the last question and the matrix determinant does not equal 0. I am not sure how to go about proving that theymap onto each other and are the same. I tried multiplying the matrix by itself to produce the “x^2” which would produce a determinant of 2a^2b^2. I just was not sure where to go from there.

We’ll be seeing that there is an apparent typo in the problem, and that Jenny is having trouble distinguishing some central concepts (which is what we’re here for!).

The two groups are not fully defined here, as we also need their operations. The operation on \(\hat{M}\) is ordinary matrix multiplication; the “*” operation on \(\hat{P}_2\), as mentioned, was defined last time: $$(ax^2+ b)*(cx^2+ d) = (ac)x^2 + (ad + bc)$$

Doctor Fenton answered:

To prove isomorphism of two groups,

you need to show a 1-1 onto mapping between the two. Just observing that the two groups have thesame orderisn’t usually helpful. (In this case, both sets are infinite, so you need to show that they have the same infinite cardinality.) You also need to show that theisomorphism preserves operations, i.e. the mapping is a group homomorphism: that is, if g_{1}and g_{2}are elements of G and φ:G→H is the mapping, then φ(g_{1}g_{2})=φ(g_{1})φ(g_{1}), and for all gεG, φ(g^{-1})=φ(g)^{-1}. Those are the properties that you need to verify.

The order of a group is the number of elements in it (its cardinality). It isn’t clear whether Jenny, in saying they have the same order, meant that (both are infinite), or something else such as both having two dimensions. Either way, it suggests that there is a one-to-one (“1-1”) mapping between them, but we need a very *special* one.

The statement of the problem is also wrong.You just described the set as “square matrices”, but that is too vague. The matrices should be described as 2×2 upper triangular real matrices, with the (1,1) entry non-zero.The problem is that

Mˆ is NOT a group. Since the problem doesn’t specify any operation for Mˆ, we can assume that the operations are ordinary matrix operations. Did you notice thatMˆ isn’t closed under multiplication? You also state that the matrices have non-zero determinant. The determinant of \(\begin{bmatrix}a & b \\0 & b \\\end{bmatrix}\) is ab, but the definition doesn’t require that b≠0, so all matrices \(\begin{bmatrix}a & 0 \\0 & 0 \\\end{bmatrix}\) belong to Mˆ, andthey do not have non-zero determinants.

While we’re on the topic, I’ll insert here a further explanation he gave when he returned to this issue later:

I pointed out in my first reply that the set of matrices as described is NOT a group. If you multiply \(\begin{bmatrix}a & b \\0 & b \\\end{bmatrix}\begin{bmatrix}c & d \\0 & d \\\end{bmatrix}\), the result is \(\begin{bmatrix}ac & ad+bd \\0 & bd \\\end{bmatrix}\), which is

NOTof the form \(\begin{bmatrix}A & B \\0 & B \\\end{bmatrix}\), so the set isNOTclosed under matrix multiplication.

Can we correct the problem, guessing what was intended? Yes:

So the correct answer to the problem AS WRITTEN is “This doesn’t make sense”. The problem tacitly assumes that M is a group (since the symbol used is a symbol for isomorphism) under standard matrix multiplication (since it doesn’t mention a different operation). However, it seems that the problem was expected to make sense (it didn’t ask if M WAS a group). That suggests that

something very similar should make sense.One thing to consider is that since each polynomial is defined by two numbers, a and b, each matrix should also be determined by two numbers. The given definition of M repeats one of the numbers –

perhaps it should have repeated the other number instead. That change makes M a group, it works to make matrix multiplication create a new element in the same way that the * operation does in P_{2}ˆ(x).So the problem apparently has a misprint. Mˆ should be the matrices of the form \(\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}\) with a ≠ 0

This set IS a group, and the homomorphism between Mˆ and P

_{2}ˆ(x) should be pretty clear.

The problem, then, should really be this:

Let \(\hat{M} =\left\{\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}|a,b\in\mathbb{R},a\ne0\right\}\) and \(\hat{P}_2(x) = \left\{ax^2+b|a,b\in\mathbb{R},a\ne0\right\}\) using the * operation defined previously, namely \((ax^2 + b)*(cx^2 + d) = (ac)x^2 + (ad + bc)\).

Show \(\left <\hat{P}_2(x),*\right >\cong\left <\hat{M},\cdot\right >\).

Jenny saw the need to start at the beginning:

So if I take this one step at a time. What are the steps to show a

one-to-one mappingbetween a matrix and a binomial?

Doctor Fenton responded, suggesting a way to look for the mapping, by comparing **structures**:

Math is about patterns, and the most obvious patterns to notice here are

- the matrices \(\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}\) have only
two distinct entries;- the polynomials ax
^{2}+b have onlytwo distinct coefficients.Does that suggest anything?

Also, the

diagonal entries of the matrix occur twice, once in each row, and one coefficient of the polynomial multiplies x^{2}, but probably even more importantly, in the given definition of multiplication,the coefficient of the xof the product of two polynomials.^{2}term occurs in both of the coefficientsIn abstract algebra, you are learning to look at

mathematical structures. In the case of groups, the group is a set of elements, but it’snot just a set. It has an algebraic structure imposed: in these examples, a way of multiplying elements. You don’t just consider any functions between groups that are just set mappings. The only mappings of interest are those thatpreserve the multiplication structures: if you take two elements of one group and find their product, and then look at the images of the first two elements in the second group, the product of the two images in the second group will be the image of the product in the first group. That’s what I meant by writing φ(g_{1}g_{2})=φ(g_{1})φ(g_{2}).

The first hint suggests that we should match up *a* to *a* and *b* to *b* in the two objects; but that only gives us something to try. We then have to see whether that works, or we need to try something else (such as matching *a* to *b*!).

Since it hasn’t been written out yet, let’s look at the product of two of these matrices: $$\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}\begin{bmatrix}c & d \\0 & c \\\end{bmatrix}=\begin{bmatrix}ac & ad+bc \\0 & ac \\\end{bmatrix}$$ We see that the set is closed (because we have *ac* twice on the diagonal), and also that there is a strong parallel with the *-product \((ax^2 + b)*(cx^2 + d) = (ac)x^2 + (ad + bc)\). This will be the motivation for our mapping.

Jenny replied,

I think what you are saying is that I am coming from lower math where we just use formulas that remain consistent for a type of problem. In higher math, we are looking for patterns that are unique to the information given. So in this problem with the matrix below, I see that the coefficients in the binomial match the entries in the matrix. And I see that when I multiply that matrix by a new matrix

[a b] [c d] [0 a] [0 c]I get the star product given in my last problem (ac)x^2 + (ad + bc)

Which I believe is the definition of an isomorphism. So by logic and arithmetic, I see that is true. How do I show that in an abstract algebra way?

After a side discussion, Jenny continued:

To verify two functions are isomorphic, I believe we need toshow they are operation preserving, one to one, and onto. If we are to assume the problem was written incorrectly, we have already shown that our corrected version is operation preserving. I have never understoodhow to show they are 1 to 1 and onto. I understand what they are, just not how to prove it.

Doctor Fenton first corrected Jenny’s wording, then talked about the requirements for a proof:

Let’s be careful about terminology. You are showing that two

groupsare isomorphic, not twofunctions. A function or mapping between two groups is ahomomorphismif it is operation-preserving, and anisomorphismis a one-to-one and onto homomorphism.To show a mapping φ:G→H is

one-to-one, the usual procedure is to assume that g_{1}and g_{2}are elements of G such that φ(g_{1}) = φ(g_{2}), and then show that g_{1}= g_{2}.To show that φ is

onto, let h be any element of H, and show how to find an element g of G such that φ(g) = h.

Have you defined the mapping φfrom M to P_{2}ˆ(x)? (You need to define it to show that it is operation-preserving, and verify that it is 1 to 1 and onto.)If \(m=\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}\) in M, what is the image φ(m) in P

_{2}ˆ(x) ?

Jenny gave it a try, but was still confusing the groups with the mapping:

Thank you. I am struggling with the vocabulary. Would that mean

the matrix is one to onebecause there exists a pivot in every columnand it is ontobecause it is invertible.

Thebinomial is 1-1because G(x_{1}) = G(x_{2}).a(x

_{1})^2 + b = a(x_{2})^2 + bx

_{1}= x_{2}and since it is invertible it is

onto.I thought I proved mapping when I used the star product with ab matrix and cd matrix to get star product as shown.

Doctor Fenton dove into the details, explaining what it means to define a mapping:

You have

two groups: M, a set of 2×2matrices; and P_{2}ˆ(x), a set ofpolynomials. You mustdefine a mapping(or function) φ from M to P_{2}ˆ(x). To define φ,for each matrix m in M, you must describe the polynomial p(x) which is the image of m: φ(m)=p(x).Let \(m=\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}\) in M. What polynomial p(x) do you want to map this matrix m to?

To specify p(x), you must

specify the coefficientsof p(x): p(x) = ( )x^{2}+ [ ]. What numbers must you put in the parentheses and brackets? Those coefficients should be determined by the entries in the matrix m. You need to choose the coefficients for p(x) so that φ is a homomorphism (preserves the operations: if φ(m_{1}) = p1(x) and φ(m_{2}) = p2(x), then φ(m_{1}m_{2}) = p_{1}(x) *p_{2}(x), where m_{1}m_{2}is the ordinary matrix product, and p_{1}(x)*p_{2}(x) uses the *-product you studied earlier).Then you need to

show that φ is 1 to 1: if m_{1}and m_{2}are matrices such that φ(m_{1}) = φ(m_{2}), then m_{1}= m_{2}.Finally, given any polynomial p(x) in P

_{2}ˆ(x),find a matrixm in M such that φ(m) = p(x).

Jenny got the key ideas:

Mapping: GivenM = [a b] [0 a]and p(x) = ax^2 + b,

let Φ(m) = p(x), and a, b ∈ M.

Then

[a b] = ax^2 + b [0 b]

Operation-preserving: Givenm1 = [a b] and m1 = [c d] [0 b] [0 c]and

p(x) = ax^2 + b, q(x) = cx^2 + d

then p(x)*q(x) = (ac)x^2 (ad + bc).

Then Φ(m

_{1}m_{2}) = p_{1}(x)*p_{2}(x)[a b] [c d] = (ax^2 + b)*(cx^2 + d) [0 a] [0 d][ac ad+bc] = (ac)x^2 + (ad + bc) [0 ac ]

For 1-1:Do I repeat what you said or do I have to prove it?

given m

_{1}and m_{2}are matricesthen Φm

_{1 }= Φm_{2}m

_{1}= m_{2}I think I missed proving onto.

Doctor Fenton approved the main ideas and put the rest into proper form:

Very good work! You have seen all the patterns, but

you are not saying what you need to say.

You correctly identified how to define the mapping φfrom M to P2ˆ(x):If \(m=\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}\), then φ(m) = ax

^{2}+b.That defines the mapping φ.

Now you need to

show it is a group homomorphism.You write

Given

m1 = [a b] and m2 = [c d] [0 b] [0 c]and

p(x) = ax^2 + b, q(x) = cx^2 + d

then p(x)*q(x) = (ac)x^2 (ad + bc).

Then Φ(m

_{1}m_{2}) = p_{1}(x)*p_{2}(x)[a b] [c d] = (ax^2 + b)(cx^2 + d) [0 a] [0 d]What you want to say is

Let \(m_1=\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}\) and \(m_2=\begin{bmatrix}c & d \\0 & c \\\end{bmatrix}\), then φ(m

_{1}) = ax^{2 }+ b and φ(m_{2}) = cx^{2 }+ d .The matrix product \(m_1m_2=\begin{bmatrix}ac & ad+bc \\0 & ac \\\end{bmatrix}\), so φ(m

_{1}m_{2}) = (ac)x^{2}+ (ad+bc) .The *-product φ(m

_{1})*φ(m_{2}) = (ax^{2 }+ b)*(cx^{2 }+ d) = (ac)x^{2}+ (ad + bc).Comparing φ(m

_{1}m_{2}) and φ(m_{1})*φ(m_{2}), they are the same, so φ is a homomorphism.

For 1-1:If φ(m

_{1}) = φ(m_{2}), where \(m_1=\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}\) and \(m_2=\begin{bmatrix}c & d \\0 & c \\\end{bmatrix}\), then ax^{2 }+ b = cx^{2 }+ d.But ax

^{2 }+ b = cx^{2 }+ d if and only if a = c and b = d.Then \(\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}=\begin{bmatrix}c & d \\0 & c \\\end{bmatrix}\) because all their entries are the same. Therefore, m

_{1}= m_{2}, so φ is 1 to 1.

To show φ is onto:Let p(x) = ax

^{2 }+ b. Then the matrix \(m=\begin{bmatrix}a & b \\0 & a \\\end{bmatrix}\) is in M, and φ(m) = p(x), so φ is onto.

These are essentially what Jenny had, but written in proper style.

Originally, you wrote equations saying m

_{1}m_{2}= p(x)*q(x), which are nonsense.A matrix is not a polynomial.An image φ(m) is a polynomial, so it makes sense to writeφ(m

_{1}m_{2}) = φ(m_{1})*φ(m_{2}) ,but not to write

m

_{1}m_{2}= φ(m_{1})*φ(m_{2}) .The two sides of an equality must be the same thing: in this case, both matrices; or both polynomials. The matrix product is not a polynomial, but the image of a matrix product is a polynomial, and it is the *-product of the two polynomial images of the matrices.

This is the reason for talking about the homomorphism φ in the first place!

Jenny:

Thank you thank you thank you!! I am going to review it and see if I can transfer this information to a similar problem. Again. Thank you so much.

Doctor Fenton went back to make some important explanations:

I concentrated on the math question in my last reply, and I neglected to answer some of your questions from a previous reply. In particular, you wrote

Thank you. I am struggling with the vocabulary. Would that mean

the matrix is one to onebecause there exists a pivot in every column and it is onto because it is invertible?

The binomial is 1-1because G(x_{1})=G(x_{2}).a(x

_{1})^2 + b = a(x_{2})^2 + bx

_{1}= x_{2}and since it is invertible it is onto.

You have

several misconceptionshere. First,a matrix is an object, a rectangular array of numbers. It is not meaningful to say that a matrix is1 to 1. Being “1 to 1” is a property of functions (also called mappings or transformations), not a property of matrices.However,

a matrix does define a function(in linear algebra, this function is called a linear transformation), and it makes sense to say thatthis linear transformationis 1 to 1, onto, or invertible (or has none of these properties). Whether this transformation is 1 to 1 does not depend upon whether each column has a pivot (when in row-echelon form), but is a more complicated issue. (If the matrix is square, then it has a number called its determinant. If the determinant is non-zero, then the transformation defined by the matrix is 1 to 1, onto, and invertible.) Not all matrices are invertible. Every non-square matrix is not invertible.Similarly,

a polynomial is an object, a certain type of formula. As with matrices, a polynomial as an object doesn’t have the property of being 1 to 1. Butpolynomials are also functionsfrom the real numbersRtoR. However, in this problem, the polynomials in P_{2}^(x) arenot being considered as functions, but rather just polynomial objects. If you do want to consider such binomials as functions, then these quadratic polynomials arenever 1 to 1: for any p(x) in P_{2}^(x), p(-2) = p(2), since a(-2)^{2 }+ b = a(2)^{2 }+ b, so there are two values of x giving the same value. Also, they are never onto: if a > 0, then ax^{2 }+ b ≥ b is always true, so for any c < b, there is no value of x such that ax^{2 }+ b = c. A function fromRtoRis invertible if and only if it is 1 to 1 and onto, so these quadratics are never invertible.So you need to be careful about the vocabulary.

Jenny acknowledged the importance of these ideas:

Thank you. That is one reason I was so confused.

I know that the graph of a polynomial is not one to one or onto so I didn’t understand how I was to prove that it was both. But it seems that 1-1 and onto refer to the mapping of one to the other not the algebra vertical or horizontal line test type of testing functions.

Doctor Fenton agreed:

]]>That’s right!

Today we’ll look at a classic algebra word problem: Finding how long it takes to fill a cistern through two pipes, with a drain open. But usually these problems are given with specific numbers, as a simple exercise in algebra. What if it’s all variables? the discussion provides some ideas on how to approach a problem that’s a little harder than you are used to.

Here is the problem, from late April:

A basin is

filledby two pipes separately at p and q minutes, respectively. The cistern isemptiedin r minutes by another tube. The 1st tube was closed s minutes after the three tubes were opened together.How long will it take to complete the cistern?

This is a classic, as I said. Of course, in normal usage one would close the drain before starting to fill; in this scenario, we turn on both inlets along with the drain, and then turn off one of the inlets while still keeping the drain open! Less math would be needed if only these people did the right thing … but then, that’s always been true of word problems, which are designed as challenges, not realistic scenarios!

Doctor Rick answered in the way we usually do when we are given no work by which to judge a “patient’s” needs:

Hi, Zawad.

Because you have not shown any work on the problem (as we asked you to do on the submission page), I can’t be sure what methods you have learned or where you are having difficulty.

I am fairly sure that you would not have been assigned such a problem, with only variables and no numbers, unless you had previously learned how to solve similar problems with numbers – for instance,

A basin is filled by two pipes separately in 10 minutes and 8 minutes, respectively. The cistern is emptied in 20 minutes by another tube. The basin was empty and the three tubes were opened at the same time; then after 5 minutes the first tube was closed.

How much longer will it take to completely fill the cistern?Can you solve this problem? I would approach it by working with rates, in “basins per minute”. In other words, what fraction of the basin is filled (or emptied) by each pipe in a minute? Then I could work out the fraction that remains to be filled after the first 5 minutes, and go on from there.

If you still can’t solve this one, can you show me a similar problem, probably a little simpler, that you can solve?

And

if you can solve my problem with numbers, can youshow me your work on the original problemwith variables only, so I can see where you are having trouble?

He has done several things here. First, he has asked to see some work, in order to see more clearly where Zawad needs help. Second, he has suggested the simpler problem as a way to ease into the problem, on the assumption that it is the multiplicity of variables that is the main issue. Third, and more subtle, he has clarified some of the wording of the problem, improving the English grammar without calling attention to it, explicitly indicating that the cistern started out empty, and also clarifying what time is to be determined: “how much longer (after the first tube is closed)”, rather than how long in total. The latter detail may turn out not to be what was expected (especially if the problem has been translated from another language that made it clearer). All of these things are helpful in dealing with a challenging problem.

He also suggested a standard method for solving such a problem, expecting that to be only a reminder (and therefore not going into too much detail, and trusting that Zawad will ask for more if needed).

Here is the new problem:

Zawad replied, confirming the central issue and solving the simpler problem with ease:

Yes,

I haven’t solved such problem with only variables.I’ve tried to solve the math you given.

Together three pipes can fill {(1/10) + (1/8) – (1/20)} = 7/40 portion in one minute.

Together they were opened for five minutes. Thus the basin filled 7/8 portion and left (1-(7/8)) portion = 1/8.

Now, together 2nd and 3rd pipe can fill 1/8 – 1/20 = 3/40 portion in one minute. The remaining ⅛ portion will be filled in 40/3 × 1/8 = 5/3 minute.

So, the total time is = 5 + (5/3) = 20/3 = 6.67 minute (approx).

Is this right? I think

the problem with variableiss + [qr(pqr – qrs – rsp – spq)/(r – q)]

Let’s examine his solution of the numerical problem, filling in some details he didn’t need to show, knowing we would understand:

The three pipes respectively increase the amount of liquid in the cistern by \(\frac{1}{10}\), \(\frac{1}{8}\), and \(-\frac{1}{20}\) of its volume per minute, each rate being the reciprocal of the time it takes to fill, and the negative meaning it drains it.

Therefore, together the volume increases by \(\frac{1}{10}+\frac{1}{8}-\frac{1}{20}=\frac{4}{40}+\frac{5}{40}-\frac{2}{40}=\frac{7}{40}\) per minute.

Over the first 5 minutes, the level in the cistern will reach \(\frac{7}{40}\times 5 = \frac{7}{8}\). There is \(\frac{1}{8}\) left to fill.

After closing one, there are two pipes open, so the new fill rate is \(\frac{1}{8}-\frac{1}{20}=\frac{5}{40}-\frac{2}{40}=\frac{3}{40}\) per minute.

To find the number of minutes to fill \(\frac{1}{8}\) at a rate of \(\frac{3}{40}\) per minute, we divide: \(\frac{1}{8}\div\frac{3}{40}=\frac{1}{8}\times\frac{40}{3}=\frac{5}{3}=1\frac{2}{3}\) minutes.

We can also check the answer. The first pipe ran for 5 minutes, and added \(\frac{5}{10}=\frac{1}{2}\) to the cistern. The second pipe ran for \(6\frac{2}{3}\) minutes, and added \(\frac{1}{8}\times\frac{20}{3}=\frac{5}{6}\) to the cistern. The third pipe ran for \(6\frac{2}{3}\) minutes, and removed \(\frac{1}{20}\times\frac{20}{3}=\frac{1}{3}\) from the cistern. This leaves a total of \(\frac{1}{2}+\frac{5}{6}-\frac{1}{3}=\frac{3}{6}+\frac{5}{6}-\frac{2}{6}=\frac{6}{6}=1\) full cistern.

Zawad explained this very clearly, showing that he understands the underlying principles well. But what of his answer to the real problem? $$s + \frac{qr(pqr – qrs – rsp – spq)}{(r – q)}$$ for which no work was shown?

Doctor Rick acknowledged the correct numerical answer, and pointed out a problem in the variable answer:

Thanks for your response, Zawad. You got the correct answer for my version of the problem with numbers.

Your result for the original problem does not look quite right to me: p, q, and r are in units of minutes, so

your expression for the time has units of minutes. But probably most of what you did (whatever it was) was correct.^{4}Could you show me your work? I’d like to see, at least, your expression for the fraction of the basin remaining to be filled when the first pipe is closed.

In Zawad’s fraction, the numerator is a product of 5 numbers of seconds, and the denominator is a number of seconds, leaving a fourth power of seconds. So something is wrong.

Zawad showed his work, redone:

Together three pipes can fill {(1/p) + (1/q) – (1/r)} portion

in one minute.Together they were opened

for s minutes. Thus the basin filled {s(qr + pr – pq)}/pqr portion andleft(pqr – sqr – spr + pqs)/pqr portion.Now, together 2nd and 3rd pipe can fill (1/q) – (1/r) = (r-q)/qr portion in one minute. the remaining portion will be filled in (pqr – sqr – spr + pqs)/p(r – q) minute.

So, total time is s + {(pqr – sqr – spr + pqs)/p(r – q)} = {qr(p – s)}/{p(r – q)}

Sorry, I had a mistake in calculation while solving yesterday???

The method here closely parallels the method with specific numbers; that is one reason for trying a numerical version of the problem, which I often think of as a “dry run” to get used to a process in a less intimidating setting. Zawad may not have needed that practice. We won’t get to see the error he previously made, which was probably minor. Sometimes just redoing the work with someone watching (or with yourself paying closer attention) is all it takes to correct an error!

In case you don’t follow the details of the work, here is it again, with some gaps filled in:

Rate per minute with all three pipes open: $$\frac{1}{p}+\frac{1}{q}-\frac{1}{r}=\frac{qr}{pqr}+\frac{pr}{pqr}-\frac{pq}{pqr}=\frac{qr+pr-pq}{pqr}$$

Amount filled in *s* minutes: $$s\left(\frac{qr+pr-pq}{pqr}\right) = \frac{sqr+spr-spq}{pqr}$$

Amount yet to be filled: $$1-\frac{sqr+spr-spq}{pqr} = \frac{pqr}{pqr}-\frac{sqr+spr-spq}{pqr} = \frac{pqr-sqr-spr+spq}{pqr}$$

Rate per minute with only pipes 2 and 3 open: $$\frac{1}{q}-\frac{1}{r}=\frac{r}{qr}-\frac{q}{qr}=\frac{r-q}{qr}$$

Minutes to fill remaining amount: $$\left(\frac{pqr-sqr-spr+spq}{pqr}\right)\div\left(\frac{r-q}{qr}\right) =\\ \left(\frac{pqr-sqr-spr+spq}{pqr}\right)\cdot\left(\frac{qr}{r-q}\right) =\\ \frac{(pqr-sqr-spr+spq)(qr)}{pqr(r-q)} =\\ \frac{pqr-sqr-spr+spq}{p(r-q)}$$

If it is the total time that is required, we have to add on the initial *s* minutes: $$s+\frac{pqr-sqr-spr+spq}{p(r-q)} =\\ \frac{(sp(r-q)+(pqr-sqr-spr+spq)}{p(r-q)} =\\ \frac{spr-spq+pqr-sqr-spr+spq}{p(r-q)} =\\ \frac{pqr-sqr}{p(r-q)} = \frac{qr(p-s)}{p(r-q)}$$

Doctor Rick confirmed the answer, and took it a little further:

Yes, that’s the answer I got (the

secondtime I tried — it is easy to make a mistake in solving!).Examining the expression

qr(p – s)

———–

p(r – q)I see first that it has the

correct units(minutes).Then I notice the

(r – q) in the denominator, indicating that something problematic happens if r = q – and indeed, this means that the second input (which flows the entire time) balances the outflow, so that once the first input is turned off, the level in the basin never changes.I am a little puzzled by the

(p – s) in the numerator. If p = s then the first input pipe alone will fill the basin in the s minutes that it is open; but how could thetotaltime to fill the basin bezerominutes? Clearly the model breaks down for some values of the variables. The “additional” time needed to fill the basin after the first pipe is closed can be negative, in which case we have an unphysical situation. In particular, if p = s then this additional time is –s, so that adding s gives a total time of zero.

This sort of examination is useful in several ways; we check our answer by looking at it from various perspectives, and also try to learn whatever can be learned from it. Or it may just satisfy curiosity.

First, he gave the answer a “sanity check”, making sure it wasn’t nonsense by checking the units (minutes cubed over minutes squared = minutes).

Second, he observed features of the rational function we obtained, particularly zeros and asymptotes, to see if they make sense in the problem. (We can ignore impossible cases where *p*, *q*, or *r* are zero.) Looking at the denominator, the time will be infinite if the second and third pipes have the same rate (\(r-q=0\)), since after the first pipe is turned off, we just have a steady state, with as much water going out as comes in. That makes sense. (We should also observe that if \(r<q\), so that it takes longer to fill through the second pipe alone than to drain through the third, the time will be negative, which is again an impossible case: water just passes through, leaving the cistern empty and never allowing it to fill. You can’t take water out of an empty cistern.)

The case \(p=s\) is odd, as he explained. Taking it a little further, the formula for the time needed after closing the first pipe is $$\frac{pqr-sqr-spr+spq}{p(r-q)}=\frac{qr(p-s)-sp(r-q)}{p(r-q)}$$ In this form, we can see that if \(p=s\), this time is \(-s\). Indeed, if \(p<s\), so that the cistern would be filled through the first pipe alone, before the time to close it, then the whole procedure makes no sense.

To write a proper formula, we should determine when it is valid, by considering conditions under which various quantities along the way to the solution become negative, so that the rest is invalid. For example, among the required conditions would be that the drain is not faster than the two inlet pipes together (so that the cistern does not just remain empty for the first *s* minutes), and that the combination of all three pipes does not overflow the cistern within the first *s* minutes).

But we can do one more thing: Put the numbers from the sample problem into our general formula, to confirm that we get the same answer. We have \(p=10, q=8, r=20, s=5\), so the formula gives $$\frac{qr(p-s)}{p(r-q)} = \frac{(8)(20)(10-5)}{10(20-8)} = \frac{800}{120} = \frac{20}{3} = 6\frac{2}{3}$$ as before.

]]>Until now I have been using my backlog of topics for the blog, accessing pages through The Wayback Machine; but as it has become impossible to search for relevant pages on new topics, I am stopping the weekly posts based on the archive, and will be only posting weekly about recent questions. This coincides with the start of summer, when I am happy to have a little less to do; and it also fits with the original plan to eventually focus on new material. When the archives become available again, I will return to using them, but perhaps less often. There is still a lot of material worth revisiting.

Until then, if you need to access a page I’ve linked to (in early posts I often left out much of their contents), you can type in the address at https://web.archive.org/ . Here is the home page.

]]>

Here is the initial question, from early April:

I need to solve this problem:

I know the 4 things that need to be proven, and associative has been completed as sidework by the professor. I just do not know how to even start the problem.

A group is “a **set** equipped with an **operation** that combines any two elements to form a third element while being **associative** as well as having an **identity** element and **inverse** elements.” Hidden here is a fourth requirement, that the operation must be **closed** (that “third element” must be a member of the same set). The professor took care of what is sometimes the hardest part, proving associativity (we’ll take a look at that at the end), leaving three for Jennifer to prove.

The set under discussion, called \(\hat{P}_2\), is a particular set of polynomials, namely those that can be written in the form \(ax^2 + b\) with real coefficients (second-degree real polynomials lacking a linear term). The big challenge for the newcomer is that the operation called “*” is not ordinary multiplication of polynomials, but something peculiar, made up for this problem. (From experience, it happens that I immediately recognized this operation as something familiar in disguise. We’ll take a look at that, too.) But for now, we need to help Jennifer!

Doctor Fenton answered her, just checking what she knows by carefully stating what has to be done:

Hi Jennifer,

The three properties you need to prove are (1)

closure; (2) existence of anidentity; and (3) existence ofinverseelements.The statement gives you the definition of the

underlying setP_{2}ˆ(x): it consists of polynomials of the form ax^{2}+b, where a and b are real numbers and a ≠ 0, i.e. real polynomials whose degree is exactly 2. It also defines thebinary operationp(x)*q(x) (Note that this operation is NOT the usual polynomial multiplication!)(1)

Closure: Is the product p(x)*q(x) an element of P_{2}ˆ(x)? That is, does it satisfy the conditions to be in P_{2}ˆ(x)?(2) Is there an

identity element? That is, is there a polynomiale(x)such that for every p(x) in P_{2}ˆ(x),e(x)*p(x) = p(x)? (If there is such an element e(x) = cx + d, what properties must the coefficients c and d satisfy?)(3) Does every element in P

_{2}ˆ(x) have aninverse? Given p(x) = ax^{2}+ b, is there an elementq(x)= cx + d such that p(x)*q(x)= e(x)?Can you answer any of these questions?

Jennifer replied,

Thank you. That is a better explanation of what they’re looking for. I understand the three things I’m supposed to do;

I just don’t understand how to start doing them. I don’t understand how to prove that there’s aninverse. I don’t understand how to show that there is anidentity. I’m actually completely lost.

This is not very surprising! She has presumably seen examples of how to prove something is a group, but every example is so different, it can be hard to see how to apply examples to a new problem.

Where shall we start? It might have been most natural to start by demonstrating what closure means in this particular case, but Doctor Fenton chose to start with the the identity and the inverse, which Jennifer had explicitly asked about:

You have to know what the

identity elementis before you can find aninverse. Suppose p(x) = ax^{2}+b.An identity element will be an element e(x) = cx

^{2 }+ d, and it must satisfy p(x)*e(x) = p(x).Compute the product on the left side. It gives you a polynomial will coefficients given by the product formula in the problem. That product must be the same as p(x).

Can you see values of c and d which will make that true?(The coefficient of x^{2}in the product must be the same coefficient as x^{2}in p(x). What equation does that give you?) Similarly, the constant term in the product must be the same as the constant term in p(x). You need to find values of c and d which will make these equations truefor all values of a and b.

So we are trying to find a polynomial such that “multiplying” any polynomial in the set by it will leave it unchanged. He is suggesting writing a pair of equations and solving for the needed coefficients.

Jennifer responded,

Do you mean if c is 0 and d is 1 that would mean (ax

^{2}+ b)(0x^{2}+ 1) = ax^{2}+ b?

She sees the obvious polynomial that would work as an identity **if** we were using ordinary polynomial multiplication, namely the polynomial 1. So she clearly has the idea of an identity. But there are two problems: These aren’t just any polynomials, and the “multiplication” is not ordinary multiplication! Doctor Fenton replied:

First of all, the identity e(x) must be an element of P

_{2}ˆ(x).Is 0xWhat were the requirements that e(x) = cx^{2 }+ 1 an element of this set?^{2 }+ d be in P_{2}ˆ(x)?Perhaps i

t is confusing to call this way of combining two elements of the set a product, since although the elements of the set are written as polynomials, they don’t have the usual properties of polynomials. The given rule(ax

^{2 }+ b)*(cx^{2 }+ d) = (ac)x^{2}+ (ad + bc)is just a way to take two elements of the set, and create a third element of the set. I’ll call this a “

*-product” (read “star-product”) of the two elements.What is the product (ax

^{2 }+ b)*(0x^{2 }+ 1), according to the given rule for the *-product?(One thing that seems to be

taken for granted in this problem, is that two elements p(x) = ax^{2 }+ b and q(x) = cx^{2 }+ d areequalif and only if a = c and b = d. You need to use that fact.)

In some groups, equality is defined differently; I’ll be giving an example at the end. Here, polynomials are considered equal if they are the same polynomial, as usual.

The key step that hasn’t been actually done yet is to write out what an inverse is in terms of the definition of the operation. I imagine he is giving Jennifer a chance to figure this out on her own, which will help her internalize it.

Jennifer answered

I am so sorry but I just don’t understand. Maybe the first thing I don’t understand is

what a star product is. I understand what an identity element is. I understand it is what maintains the original element. In addition it would be zero, in multiplication it would be one.

She is indeed understanding what an identity is. If the operation were addition, the identity (called the additive identity) would be 0, since \(a+0=a\) for any number *a*; and if the operation were multiplication, identity (the multiplicative identity) would be 1, since \(a\cdot 1=a\) for any number *a*. The new thing, that is taking time to absorb, is the very idea of a made-up operation.

An hour later she added several good ideas:

So… If (ac)x

^{2}has to equal ax^{2}would c not be one?And if ad needs to = a, would that meand needs to equal 1as well? Or am I still confused? I just realizedthat formula was the definition of a star product.

Although this is not quite right, she has seen the central issue: She has to use the definition of the operation in order to see what the identity element will be.

Doctor Fenton replied:

You are partly right. Since the product p(x)*e(x) must equal p(x), then

p(x)*e(x) = (ax

^{2}+b)*(cx^{2}+d) must equal p(x)=ax^{2}+b.

I still haven’t seen you write out the *-product p(x)*e(x). (If you don’t like that name, just call it the product, but it is not the product in the usual sense of polynomials. It is a binary operation on these polynomial-looking elements of P_{2}ˆ(x): you give me p(x) and q(x), and I use them to make a third element using the recipe(ax

^{2 }+ b)*(cx^{2 }+ d) = (ac)x^{2}+ (ad + bc)This *-product must be the same as ax

^{2}+b, so by the equality rule I mentioned earlier (which the problem seems to just assume), thenyou are correct that ac must equal a, and so c must equal 1. But then you sayAnd if ad needs to = a, would that mean d needs to equal 1 as well?

Where did you see that ad must equal a? Looking at the formula for the *-product, the constant term is ad + bc, and so

ad + bc must equal b. If c = 1, what value of d will make ad + bc=b?

If the identity is \(cx^2+d\), then the definition of identity means that, for any \(ax^2+b\), $$(ax^2+b)*(cx^2+d)=ax^2+b$$ that is, $$(ac)x^2+(ad+bc)=ax^2+b$$ What must *c* and *d* be?

Jennifer did just this, but using *u* and *v* instead of *c* and *d* (likely having seen something similar in an example she had):

So… (ax

^{2}+ b)(e) = ax^{2}+ bBy definition with * product (ax

^{2}+ b)(ux^{2}+ v) = (au)x^{2}+ (av + bu)With the identity I want (ax

^{2}+ b)(ux^{2}+ v) = ax^{2}+ bThen av + bu=b

If u = 1 then av + b(1) = b, which would only work if v = 0

Which would mean there is an identity element where e = (ux

^{2}+ v)

She has omitted the “*” in her multiplications, which makes what she wrote a little confusing, but she did the right things. Here is her work written a little more correctly: $$(ax^2+b)*(ux^2+v)=ax^2+b\\ (au)x^2+(av+bu)=ax^2+b\\ au=a, av+bu=b\\au=a\Rightarrow u=1\\ av+b(1)=b\Rightarrow av=0\Rightarrow v=0$$

Doctor Fenton just had to state the conclusion:

Yes, but put in the values of u and v you found, so that

the identity element is e(x) = 1x. Now you are ready to^{2 }+ 0 = x^{2}tackle the inverse. If p(x) = ax^{2 }+ b, can you find an element q(x) = cx^{2 }+ d such that p(x)*q(x) = e(x)?

So the identity element is \(x^2\); this works as an identity because, for any \(p(x) = ax^2+b\), $$p(x)*e(x) = (ax^2+b)*(1\cdot x^2+0)= (a\cdot 1)x^2+(a\cdot 0+b\cdot 1)=ax^2+b=p(x)$$

Jennifer took on the new challenge (again omitting the *, but otherwise correct):

For inverse,

(ax

^{2}+ b)(cx^{2}+ d) = x^{2}so (ac)x

^{2}+ (ad + bc) = x^{2}so c = a

^{-1}andad + bc = 0

ad + ba

^{-1}= 0ad = -ba

^{-1}Then I am stuck. I know that I can multiply both right sides by a to eliminate the a

^{-1}on the right side but I am not sure how that helps me

She is very close! If \((ax^2 + b)*(cx^2 + d) = x^2\), then \((ac)x^2 + (ad + bc) = 1x^2+0\), so that \(ac=1\) and \(ad+bc=0\), so \(c=\frac{1}{a}\) and \(ad+b\frac{1}{a}=0\). She just needs a little nudge …

Doctor Fenton:

Remember what you are trying to do: p(x) is given, so

you know a and b, and you know that a ≠ 0. You are trying to FIND c and d.You found c. How can youfind dfrom the equation ad = ba^{-1}?

Jennifer repeated the same work, going a little farther:

Then

ad + bc = 0

ad = -bc

ad = -ba

^{-1}a

^{-1}a d = -a^{-1}ba^{-1}d = -a

^{-1}b a^{-1}These are not commutative, correct? So can I simplify beyond this?

Now she has absorbed a little *too* well the idea of abstract algebra! She is forgetting that at this point she is doing ordinary algebra on ordinary numbers. This, too, is not uncommon when first learning these ideas.

Doctor Fenton reminded her where she was:

But

a, b, c, and d are real numbers, and the arithmetic used (ac and ad + bc) is ordinary multiplication and addition of real numbers,which ARE commutative. In fact, the group is commutative: p(x)*q(x) = q(x)*p(x).So, now you can write the inverse q(x) of p(x) = ax

^{2 }+ b.

(We hadn’t mentioned until now that **the star operation is commutative**; we don’t need to know this, but it was assumed when we defined the identity and the inverse without doing the operation on both sides. To see this, observe that $$p(x)*q(x) = (ax^2+b)*(cx^2+d) = (ac)x^2+(ad+bc)$$ while $$q(x)*p(x) = (cx^2+d)*(ax^2+b) = (ca)x^2+(da+cb) = (ac)x^2+(ad+bc)$$ These are equal, so the operation is commutative. There is a certain symmetry in the operation’s definition that makes this work, which is so obvious to those accustomed to it that it almost goes without saying – and almost did!)

Jennifer finished the work:

Then d = -b/a

^{2}or d = -ba

^{-2}

So the inverse of \(ax^2+b\) is \(a^{-1}x^2-ba^{-2} = \frac{1}{a}x^2-\frac{b}{a^2}\).

Doctor Fenton:

That’s right! Good work!

Is this making more sense now? Math is sometimes thought of as an intellectual game, in which one can make up the rules, and this problem is an example. However, much, if not most, of the time, the rules are given by a situation, and seeing a structure such as a group can tell you things you wouldn’t normally realize. The idea of a group is very important in physics, for example.

Jennifer confirmed the central reason for her confusion, and continued to the fact we skipped over, namely closure:

Yes! Thank you so much! I am in abstract algebra but haven’t really taken math since Calculus 1, 25 years ago. I have found that I am missing a lot of information.

I thought star product was regular multiplication and I was so confused.For the

closurepart of the proof can I just say that since it is given that a, b are real numbers and p, q are real numbers then p * q would also include real numbers and therefore be closed under multiplication or do I need more information?

No, it takes a little more than that. Doctor Fenton gave a reminder:

Remember that the two conditions for p(x) = ax

^{2 }+ b to be in P_{2}ˆ(x) were thata and b be real numbers, anda ≠ 0. You should show those properties for the product of two such elements. The product has ac and ad + bc. Those are real numbers.Is ac non-zero?

The reason that the two coefficients in the “product” are real numbers is that addition and multiplication of real numbers are closed (or rather, that the real numbers are closed under ordinary addition and multiplication). We just need to see that the coefficient of \(x^2\) in the “product” is non-zero. Is it always so?

Jennifer took a stab at it:

So… because we determined that ac = 1. Therefore a does not equal 0 and c does not = 0 so it is closed

Doctor Fenton corrected this reference to the wrong previous work,

No, ac doesn’t have to be 1, it can be

anynon-zero real number. For example, πx^{2 }+ (ln 2) and (√2)x^{2 }– 1 are both elements, and their product is (π(√2) x^{2}+(-π + (√2)(ln 2)), soac need not be 1.

Jennifer made the required small fix:

Right. That was for the identity. Just when I think I have it. Then if

a does not equal 0 and c does not equal 0then ac does not equal 0 and it is closed.

Doctor Fenton confirmed the answer:

Right!

Jennifer closed with thanks:

Thank you so much for sacrificing your Easter to help me!! May God richly bless you.

Now let’s get back to a couple things I mentioned at the start.

The professor had already shown the that operation is associative, so we didn’t get to see that. Let’s fill in the gap, for our own sake.

The operation “*” is associative if, for any three elements *p*, *q*, and *r*, it is true that \((p*q)*r = p*(q*r)\). So we just have to compute those.

$$(p(x)*q(x))*r(x) = \left((ax^2+b)*(cx^2+d)\right)*(ex^2+f) =\\ \left((ac)x^2+(ad+bc)\right)*(ex^2+f) =\\ ((ac)e)x^2+((ac)f+(ad+bc)e) =\\ (ace)x^2+(acf+ade+bce)$$

$$p(x)*(q(x)*r(x)) = (ax^2+b)*\left((cx^2+d)*(ex^2+f)\right) =\\ (ax^2+b)*\left((ce)x^2+(cf+de)\right) =\\ \left((a(ce))x^2+(a(cf+de)+b(ce))\right) =\\ (ace)x^2+(acf+ade+bce)$$

These are equal, so we have it.

I mentioned something I saw from the start that made some of the answers obvious to me. What was it?

Compare the definition of “*”, $$p(x)*q(x) = (ax^2+b)*(cx^2+d) = (ac)x^2+(ad+bc)$$

to this definition of addition of fractions: $$\frac{b}{a}+\frac{d}{c} = \frac{ad+bc}{ac}$$

It appears that the operation works like adding fractions, if we equate \(ax^2+b\) to \(\frac{b}{a}\). That is, we can think of \(a\) as the denominator of a fraction and \(b\) as the numerator, and the operation as addition. And the fact that *a* can’t be zero agrees with the fact that a denominator can’t be zero/

Then since the **additive identity** for fractions is \(0 = \frac{0}{1}\), we can expect the identity for our operation to be \(1x^2+0\).

And since the **additive inverse** of \(\frac{b}{a}\) is \(-\frac{b}{a}\), the inverse of \(ax^2+b\) should be \(ax^2-b\). Well, not quite! As we saw, it’s \( \frac{1}{a}x^2-\frac{b}{a^2}\), which corresponds to the fraction \(\displaystyle\frac{-\frac{b}{a^2}}{\frac{1}{a}}\); this is *equivalent* to \(-\frac{b}{a}\) when you multiply numerator and denominator by \(a^2\).

What happened? Recall the comment that they implicitly take two elements as equal when corresponding coefficients are equal? In fractions, \(\frac{a}{b}=\frac{c}{d}\) not only when \(a=c\) and \(b=d\), but whenever \(ad=bc\). So in fractions, we are treating some *different* ordered pairs \((a,b)\) as equal, while in the group under discussion, they are treated as distinct. So our group is not *isomorphic* to the group of fractions under addition (that is, it doesn’t have *all* the same behavior), but it is closely related.

Fractions can be technically defined as ordered pairs \((n,d)\) of real numbers such that *d* is non-zero, fractions are considered equal as stated above, and operations are defined as I showed for addition. The group in this problem can similarly be thought of as ordered pairs \((a,b)\) with a few conditions and an operation defined. The fact that they were written as polynomials was not essential to the problem, but in fact disguised their similarity to fractions.