We’ve looked at the basic transformations of a function and how they affect its graph, then at how they combine, and then how they can interact with specific functions. Now let’s look at one problem from beginning to end, looking at a graph and finding the function that goes with it.

Here is the question, from 2017:

A Transformational Tour When finding a function based on its graph, I tend to take the x and y coordinates of a point from that graph and plug them into the equation of the parent graph. I do that to find the constant by which the function's y value needs to be multiplied. However, I realize that this is only valid if the function involves a vertical transformation. To test for a horizontal transformation, I guess I'd need to divide x by a constant, and plug in the x and y coordinates of any point on that graph. But how do I knowwhether to test a graph for a vertical or a horizontal transformationin the first place?

Alex is describing a problem in which he is shown a graph of one of the basic “parent” functions (like a parabola) that has been transformed, and has to determine the transformations of the parent function to obtain the function shown. His initial question relates to what we looked at last time, where both horizontal and vertical stretches might be involved, or they might interact in such a way that it is up to you to decide which to use. I answered:

Hi, Alex. This is not an "either/or" situation;you can have a combination of BOTH horizontal and vertical transformations, which may also include stretches (or whatever you want to call them) mixed with shifts (translations) and reflections. Moreover, sometimes the same graph can be viewed as either a horizontal or vertical stretch (e.g., for a parabola); other times the two are distinct (e.g., for a trig function). See the first two references below for these two situations. I'm not sure about a general rule; but like you, I tend tolook at the shape of the graph, imagining that I have done any translations and reflections to bring the graphs together arounda key point, e.g., a vertex; then looking at another point to find the appropriatescale factor(s). I've never thought of it this way before, but maybe for any graph with only one key point (e.g., the vertex of a parabola), you can choose arbitrarily between horizontal and vertical stretches, using an arbitrary point to determine the scale factor; while for those with more than one key point (e.g., center point and peak on a sine graph), using two of them tells you about distinct horizontal and vertical scale factors. But a detailed answer depends on context. Do your graphs have only one transformation at a time? or do you need to deal with combinations? Andwhat kinds of functions are you working with?In addition to that context (what functions and transformations you have learned), it may be helpful to see a specific example. If your graph is not already online at a web address you can reference in text, please post it on any picture-sharing site, and then give us its URL.

After this I gave links to the pages we looked at last time.

Alex responded with the information I’d asked for:

I'm working with Larson's Precalculus. Section 1.7 has some typical exercises:given one point on the transformed graph, determine what transformations happened to the graph, and what the function is. Here's an example comparable to what I need to do in my homework assignments (though, as I mentioned earlier, typically I need to figure out whether a given graph contains a vertical or horizontal transformation): http://www.mathmotivation.com/lessons/graphs-of-functions-ex4.html I've learned all theparent functionsso far -- including cubic, quadratic, square root, absolute value, constant, reciprocal, and step functions -- but with onlyone non-rigid transformation per graph. That is, each transformation involves only one type of shrinkage or stretching (though it may involve, say, both a stretch and a shift and a reflection in the x or y axis).

In this context, “non-rigid transformations” are stretches or shrinks (dilations); translations and reflections are “rigid transformations”, which keep the shape unchanged.

The link contains this problem:

Exercise 4 – Finding the Equation of a Given GraphThe graph of y = x

^{2}is shown below. Also, a graph that is a shift, a reflection, and a vertical stretch of y = x^{2}is shown in green. Use the Function Graphing Rules to find the equation of the graph in green and list the rules you used. Verify your answer on your graphing calculator but be able to explain all the shift rules used.

Since the parent function is a parabola, we know from last time that (a) stretches can be assumed to be vertical, since that is equivalent to a horizontal shrink, and (b) horizontal reflections are irrelevant, because the graph has left-right symmetry. That keeps the problem simple.

I started with general comments:

Thanks. Since there are a number of slightly different ways to present a problem like this, it's very helpful not to have to imagine what might be "typical." And although what I say about this one will not be exactly what I would do with variant problems, the details will help me show my thinking clearly. First, I see that you are working withthe usual basic functions, and not with things like trig functions (yet). The only one of these for which there is a real difference between horizontal and vertical stretches is the step function; so if we need to do another example, that would be a good one to use. Only in that case would one bother with two stretches in combination. Second, although herethey tell you that they want a vertical stretch, I could have told you that from looking at the graph -- which makes this a good example. You'll see this as I proceed. Another nice feature of this problem (which is perhaps necessary for a computer-based problem, or more generally when they want to have one indisputably correct answer) is thatthey have specified TWO pointsin each graph. Last time, I had called these the "key point" (the vertex) from which shifts and reflections can be determined, together with an "arbitrary point" that can be used to identify stretches. If they didn't provide these, then I would look for such a point myself, which can be a bit of an art! (They really show three points, but symmetry makes the third redundant.)

Something I didn’t say there is that it is not the graphs themselves, but the points shown, that tell me to use a vertical stretch. If they had shown different points, a horizontal compression might have been appropriate. Also, they chose a vertical stretch because, as I’ve mentioned, that is the easier one to use.

If Alex had been working with trigonometric functions, it would be necessary to handle both horizontal and vertical stretches; that was touched on last time, and I’ll be looking at that case later in this answer (and covering trig functions in a later post).

If they had stated the problem in words, it might have read thus: The graph of function f isa transformation of the function y = x^2for which thevertex, (0, 0) in the parent function, transforms to(-2, 1), and the point (1, 1) transforms to (-1, -1). That's all the information you need from the picture. Now here's how I approach the problem. I'll go beyond just answering your main question of how to decide whether the stretch is horizontal or vertical -- so I can put all the pieces together for you, and you can see what I would do with somewhat different information.

So we’ll pretend they didn’t identify the transformations, and go through the entire process.

First, we'll find thetranslation and reflection. From a glance at the graph, I see that it is upside-down, so there is areflection over the x-axis. From the vertices, I see that the graph has beenshifted left 2 and up 1. I also know from experience that it's most natural if I think of the reflection and stretches as being done before the shifts. So I have this so far: Reflect over x-axis Stretch ? Shift left 2 Shift up 1

Notice that I am not doing the transformations in the same order I see them; I have chosen the appropriate order first (based on ideas we’ve seen previously), and then filling in the details as I see them.

Next, to find the stretch, Ilook at the other point, relative to the vertex. In the parent function, it is 1 unit right and 1 unit up from the vertex; in the new function, it is 1 unit right and 2 units down from the vertex. The fact thatboth are 1 unit to the righttells me that they are not thinking of it as a horizontal stretch. (That's good, because as I may have said in one of the links I provided, it's easier to handle vertical transformations; so if I had a choice, that is what I would go with.) The new graph goes down because of reflection over the x-axis; we could have used this to come to that conclusion. The fact that it goes down twice as far as the parent tells us that there is avertical stretch by a factor of 2-- that is, y coordinates are multiplied by 2. So here are the transformations, as determined entirely from looking at the two given pairs of points: Reflect over x-axis Stretch vertically by factor of 2 Shift left 2 Shift up 1

Here are the transformations: red is the parent function; purple is the result of reflecting and stretching (multiplying by -2); blue is the result of shifting left and up.

Now to write the function, I subject the expression to successive transformations in the order listed above. The last link I gave you includes an explanation of the process as I do it, if this is different from what you have learned: Original y = x^2 Reflect over x-axis y = -x^2 Stretch vertically by factor of 2 y = -2x^2 Shift left 2 y = -2(x + 2)^2 Shift up 1 y = -2(x + 2)^2 + 1

So that is our function. Or is it? There was a lot of thinking involved, which means there are a lot of places for possible errors. Therefore …

The last thing I do is tocheck that my function really gives the graphthey show. They ask you to use your calculator; I like to verify not just that it looks right, butthat the points given, at least, work out exactly. So I put each given x into my function: (-2, 1): f(-2) = -2(-2 + 2)^2 + 1 = 1 (-1, -1): f(-1) = -2(-1 + 2)^2 + 1 = -1 (-3, -1): f(-3) = -2(-3 + 2)^2 + 1 = -1 So everything looks good.

This example didn’t include everything that might have happened; if you want to learn well, it can be a good idea to try to give yourself a little extra challenge. I did that here:

Bonus:If they had marked (1, 1) on the parent and (-4, -7) on the new graph, I might have described it as ahorizontal stretch by 2, AND a vertical stretch by 8, and reflections in BOTH axes! (Though textbook authors will never do this to you, see if you can work out why this description would apply; it's a good way to "stretch" your mind!) In that case, my resulting equation would be y = -8(-(x + 2)/2)^2 + 1 If we simplify that, we end up with the same result we would get from the answer I gave above, namely y = -2x^2 - 8x - 7

Here is a graph showing these transformations (red, vertical stretch to purple, horizontal stretch and reflection to green, shifts to blue):

Now, if they had intended you to use only a horizontal stretch (a compression, actually), they would have had to mark an x-intercept as (sqrt(2)/2 - 2, 0), and the stretch factor would be sqrt(2)/2. Again, you might want to play with this to see how that would work, and how the resulting equation would be equivalent -- but if they ever did this to you, it would be with a nicer stretch factor! Thanks for the chance to write about this, which I've never done at this level of detail!

We weren’t quite done. Alex wrote back:

Thanks so much! This makes a lot more sense now, but I do have one last question. You wrote: "If they had marked (1, 1) on the parent and (-4, -7) on the new graph, I might have described it as a horizontal stretch by 2, AND a vertical stretch by 8, and reflections in BOTH axes!" I understand why this would be a vertical stretch by 8. However, why exactly is this ahorizontal stretch by 2?

I filled in the gap:

Going from (0, 0) to (1, 1) on the original graph, you go]]>right 1 and up 1. Going from (-2, 1) to (-4, -7) on the new graph, you goleft 2 and down 8. The horizontal distance is twice as far (and reversed), so we have a horizontal stretch by a factor of 2 (and reflection over the y-axis). The vertical distance is 8 times as far (and reversed), so we have a vertical stretch by a factor of 8 (and reflection over the x-axis). All together: Original Transformed * ^ 2 / |1 <-------+ o + + o +---> | / 1 | / | / | / | / | / -8| / | / | / | / | / | / | / | / v *

We’ll first look at this question from 2009:

Square Root Functions and Transformations How can we tell by looking at a graph if a square root function is being horizontally compressed or vertically stretched? For example, y = 2 sqrt(x) is being vertically stretched when graphed, but y = sqrt(2x) is being horizontally compressed. I'm confused because they both graphically look like they are being horizontally compressed or vertically stretched. However, my textbook makes a distinction for both cases. Ultimately, I would like to know how to tell the difference by merely looking at a graph.

Jon has been given two functions to graph, and astutely observed that the results of a vertical stretch and a horizontal compression look essentially alike. What will he do when he is given a graph and asked to determine the function?

The red graph is \(f(x)=\sqrt{x}\), and the red point is (4, 2). A vertical stretch produces the blue graph, \(g(x)=2\sqrt{x}\), with the point, (4, 4) showing the doubling of *y*. A horizontal compression by the same factor produces the purple graph, \(h(x)=\sqrt{2x}\), with the point, (2, 2) showing the halving of *x*. There is no difference between the effects of the two; in fact, as Doctor Ali pointed out, we could have obtained the exact same graph two ways:

Hi Jon! Thanks for writing to Dr. Math. You asked a very nice question. The answer is, we can't tell! Note that in the example you gave, we can write y = Sqrt(2x) as y = Sqrt(2) Sqrt(x) ~= 1.4142 Sqrt(x) So, if we just have the graph, is it horizontally compressed by 1/2 or vertically stretched by 1.4142? We can't tell. Does that make sense? Please write back if you still have any difficulties.

That is, our purple graph above could have been obtained by vertical stretching, as shown here by the dot at \((4, 2\sqrt{2})\):

I added a comment:

It may be important to point out that not all functions behave like the square root.In general, horizontal compression and vertical stretch are NOT equivalent. (Try doing the same thing with a general linear function, y=ax+b.) In the case of the square root, it happens that we can describe the same transformation either way, making it an arbitrary choice, but that is not usually true.

So the bottom line for Jon is that it doesn’t matter: We could have described the transformation either way, and we would be correct. It is impossible to say how the originator of a given graph obtained it; but either equation is correct because they are equivalent: \(h(x)=\sqrt{2x}=\sqrt{2}\sqrt{x}\).

This is due to the product property of the square root: \(\sqrt{ax}=\sqrt{a}\sqrt{x}\). Functions that lack this, or a similar property, will not have this ambiguity.

Another function with a similar property is the square. This led to a different perspective on the issue in a 2014 question:

Shifting Images When I change from a horizontal compression to a corresponding vertical stretch, why does the image point change? For example, consider f(x) = x^2. Applying a horizontal compression of 1/3 and a shift left by 4 units yields y = [3(x + 4)]^2. The preimage point is (1, 1), and the image is (-3 2/3, 1). Applying a vertical stretch of 9 and a shift left 4 units yields y = 9(x + 4)^2. This gives the exact same graph, but now (1, 1) goes to image (-3, 9). Why? Why does basically the same transformation yield different image points for the same preimage?

Here we are combining a compression and a shift (using the format I recommended last time), then expanding into a form that pulls the 3 outside and squares it, changing the horizontal compression to a vertical stretch. These are the same function, but obtained by two different transformations. Here are the graphs and the points referred to, showing how the point (1, 1) transforms. First, horizontal compression (purple) followed by shift (blue):

Second, vertical stretch (purple) followed by shift (blue):

Both transformations have the same effect on the graph, but different effects on an individual point.

I replied, initially just pointing this out:

These arenot the same transformation. They are two different transformations that transform this particular graph into the same graph. The point (1, 1), which is on the original graph, is mapped to different points by the two transformations, but both are on the graph of the same transformed function. So the real question is, why do these two different transformations transform f into the same function?

Rhonda replied with a question like the previous one:

Thank you! Your answer raises the very question I still have: "Why do these twodifferent transformationstransform f into thesame function?" I understand that the order in which transformations are performed can sometimes yield different results, and sometimes yield the same result. What concerns me is vertical and horizontal stretches and compressions. On some functions, likequadratics, a horizontal compression of 1/3 yields the same result as a vertical stretch of 9. But when applied to asinefunction, a horizontal compression of 1/3 does not yield the same result as a vertical stretch of 3. Is there a "rule" as to when it matches up and when it doesn't?

The answer is similar to the last one, about square roots:

It's aspecial property of the function. In the case of f(x) = x^2, f(ax) = (ax)^2 = a^2x^2 = a^2f(x) So a horizontal compression by 1/a is equivalent to a vertical stretch by a^2. This is not true for the sine. Any function with this sort of property will have pairs of transformations as in your example. Other properties of functions might yield different relationships between transformations; for instance, a horizontal stretch of a log is the same as a vertical translation: log(ax) = log(a) + log(x)

This last comment points out a property of the logarithm function that makes a **horizontal stretch** equivalent to a **vertical shift**! I didn’t expand on that idea, but here is an example:

The function \(log_2(2x)\) represents a horizontal shrink by a factor of 2, taking (2,1) to (1, 2); but rewriting it as \(log_2(2)+log_2(x)= log_2(x)+1\) makes it a vertical shift, taking (2, 1) to (2, 2).

Something similar is true of an exponential function.

In each of these cases, it would be easier in practice to do the vertical transformation rather than the horizontal, which tends to be more confusing (as in Rhonda’s example, where she masterfully handled the mix of two horizontal transformations, but it took extra effort, and many students would have done poorly). So these mutable transformations can provide ways to save trouble in graphing or in writing the formula from a graph.

Let’s look at a 2016 question that takes Rhonda’s last question a step further:

Dilation Designation: Why in the Family of Trigonometric Functions Rather than Quadratics? Why are horizontal dilations usually included intrigonometricfunctions rather than in the family ofquadraticfunctions orabsolute valuefunctions? Is it possible to generate ALL quadratic functions without needing to include horizontal dilations? If so,what is it about trig functions that requires them to have a horizontal dilation in order to create the whole family?More explicitly, is there a mathematical reason trig function families are often described this way ... g(x) = a * f(bx + c) + d =a sin(bx + c) + d... while quadratic and absolute value function families are usually described with merely this? g(x) = a * f(x - b) + c =a * (x - b)^2 + cIs there something about the nature of the functions themselves that makes the b redundant in some families? Graphing the trig functions leads me to believe that b seems important. Does that have anything to do with the periodic nature of some functions? It seems like ALL absolute value functions and quadratic functions are possible without the b. But I can't even come close to proving it. I wouldn't even know where to start.

Mark is using the term “dilation” where I have been mostly using the more informal term “stretch or shrink/compress” in line with students’ questions. One reason to prefer his term is that one word covers both directions. Math terms often work this way.

Having seen the previous answers, you may see where we need to go with this. What I like to do, when a student asks “Why can’t I …” or “What don’t I have to …” is to suggest that they try it and see. Here is my reply:

I'll add a horizontal dilation into your general form for a quadratic function: f(x) = a(bx - c)^2 + d Now let's see if we can pull the b outside of the function: f(x) = a*(b(x - c/b))^2 + d factoring out the b f(x) = a*b^2(x - c/b)^2 + d distributing the exponent f(x) = A(x - B)^2 + d let A = a*b^2, B = c/b Those substitutions put the expression back into your general form, without needing a dilation. So a dilation amounts to multiplying the VERTICAL dilation by the square of the scale factor, and dividing the horizontal TRANSLATION by that factor.

So only one of the two dilations is required in order to describe any one transformation; since vertical dilations cause less trouble, we generally treat any dilation as vertical.

I encourage you to play with the graph of a quadratic and see how this works. Take any quadratic function and stretch it horizontally, then see what the new function is in your general form. For example, stretching a parabola horizontally by a factor of 2 widens it to a shape you could also get by compressing it vertically by a factor of 4. At the same time, if the original had a horizontal shift, since the stretch was centered on the y-axis, you will have doubled the shift.

So you can see the blue parabola as either “fatter” (horizontally widened, by a factor of 3) or “flatter” (vertically shrunk, by a factor of 9) than the red one:

The general formula for a quadratic function makes that decision for us, going with the vertical. If there were a horizontal translation, it would simply be the location of the vertex, with no adjustment needed for the stretching.

Next, try the same algebraic work I did above with anabsolute valuefunction, to see a similar effect. Both functions x^2 and |x| have the property that f(ab) = f(a)f(b) This is the key that makes it possible to pull the multiplier outside. As a result,we don't NEED horizontal dilations(which many people consider the trickiest of transformations), because they don't give us anything we can't do without them. (You can still, of course, be given a quadratic function including a horizontal dilation, and need to be able to graph it.) The same is true of various other functions, such as 1/x; but it is NOT true of any trig functions. Can you see why a periodic function couldn't have the necessary property? It's not periodicity directly ... Thanks for a fun question.

One term for this property is that they are “multiplicative functions”; you might think of it as distributing over multiplication. This is true of *any* power, not only \(x^2\) and \(x^{1/2} = \sqrt{x}\) and \(x^{-1} = \frac{1}{x}\).

I have one more interesting interaction. This question is from 2017, and covers some of what we discussed last time, together with a new issue:

Order of Transformations of a Function, Yet Again In school, we learned about absolute value translations, but we never covered what would happen if y = |-x|. In that particular equation, nothing changes. But what if y = |-x + 1|? I have found that y = |-x + 1| translates the graph to the right instead of the left. And what happens if y = |-2x + 1|? None of this is covered in school.

There are really two issues here. I replied:

The issues you are noticing are not really related to the absolute value so much as tocombining two horizontal transformations. As you note, y = |-x| looks the same as y = |x|. This is because theabsolute value function has symmetry about the y-axis, so that reflection over the y-axis has no effect. For your specific case, note that |-x + 1| = |-(x - 1)| = |x - 1| As you say, this is translated to the right rather than to the left.

Here, as I have done before, I rewrote the shift-first form to the stretch-first form, which in this case is reflect-first. So we first reflect the graph about the *y*-axis, which has no effect due to symmetry, and then shift.

But this is a special case of something you need to know for all functions. In general, for ANY function f, if you don't take the reflection into account, f(-x + 1) is translated "the wrong way."The symmetry of the absolute value hides the reflection, which means you don't see what's really happening. So let's focus on the general case.

The rest is a recap of the general principles I discussed last time, with a little more going on than in any example there:

I like to think of these transformations step by step. And when we are working with the horizontal transformations (translate and stretch horizontally, and reflect over the y-axis), we have to be careful about order, which, like everything else, works backwards. You have to think of each transformation as REPLACING the variable with an expression: f(x) original graph f(x + 1) translated left 1 f(-x + 1) reflected over y-axis, which moves it from left to RIGHT! The reflection step replaces x with -x, so it has to be done last to get the function we want. To make things more natural (because our minds naturally tend to see it this way), we can do the reflection first, which will result in a different form for the function: f(x) original graph f(-x) reflected over y-axis f(-(x - 1)) translated RIGHT 1 Do you see how this is similar to what I did with the absolute value function? Now, to your second equation: y = |-2x + 1| Here is a sequence of transformations to obtain it: y = |x| original graph y = |x + 1| translated left 1 y = |-x + 1| reflected over y-axis (DOES have an effect) y = |-2x + 1| shrunk by factor of 1/2 horizontally If you carry these out carefully in this order, you will get the right graph (which you can verify by checking a few points). But it is easy to get it wrong. You can instead rewrite it in factored form as y = |x| original graph y = |-x| reflected over y-axis (no effect) y = |-2x| shrunk by factor of 1/2 horizontally y = |-2(x - 1/2)| translated RIGHT 1/2 Again, this is the same as y = |2(x - 1/2)| with no reflection.

Now we’re ready for a bigger problem that I will look at next time.

]]>We’ll spend most of our time with the following question, from 2004:

Order of Transformations of a Function Could you please tell me in what order I would perform transformations such as -f(x), f(-x), af(x), f(ax), f(x)+a, f(x+a) if two or more were to be applied to f(x)? As an example, if I had f(ax+b) would I do the translation or the stretch first? I've looked in many textbooks and have been unable to find an answer.

Ozgur has learned about each individual transformation (respectively, vertical and horizontal reflections, vertical and horizontal stretches or shrinks, and vertical and horizontal shifts); but now wants to be able to read a function and determine the correct sequence of transformations. Which comes first? (And the example he gives is the hardest case.)

Since this is a favorite topic of mine, I answered:

Good question! I haven't seen this treated well in textbooks, either, but it's an important topic. You can perform transformations inany order you want, in general. But in this case, you are asking in which order to do themin order to transform f(x) into a specific goal, f(ax+b). The order makes a difference in how you get there. What I do is to explicitly write the steps, one at a time. Suppose we first do thehorizontal shrinkf(x) -> f(ax). If we then apply ahorizontal shift(translation) b units to the left, we would be REPLACING x in f(ax) with x+b, and we'd get f(a(x+b)). That is NOT what we are looking for; it's equal to f(ax+ab). So this order of doing those particular transformations is wrong.

I will be reiterating the key idea several times: the horizontal transformations (which affect the input to the function) should be thought of as replacing *x* with a new expression. I’ll also be emphasizing later some details on what each transformation does to the graph. As I said here, transformations can be applied in any order, but changing the order changes the result, so the trick is to find the order that results in the desired transformed function.

Here, shrinking first, changing \(f(x)\) to \(g(x) = f(ax)\), and then shifting, \(h(x) = g(x + b) = f(a(x+b))\), didn’t result in the desired function \(h(x) = f(ax+b)\).

If instead wefirst do the shift, changing f(x) to f(x+b), andTHEN do the shrink, we replace x in x+b with ax, and get f(ax+b), which is what we want. So that is one answer: Start with: f(x)Shift b unitsto the left: f(x+b)Shrinkhorizontally by a factor of a: f(ax+b)

When I want to be sure of the order, I always take it step by step like this.

But in fact we COULD do the two transformations in the other order, if we change the particular amounts. We can write f(ax+b) as f(a(x+b/a)), factoring out the a, and then do this: Start with: f(x)Shrinkhorizontally by a factor of a: f(ax)Shift b/a unitsto the left: f(a(x+b/a)) = f(ax+b)

We would do this because of what the first example showed, that this order resulted in the factored form, with parentheses, so we used that form. As we’ll see later, some books teach this form as their routine method; I approve of that because this order of transformations works better for many students.

There are a couple things to notice here. First, let's visualize what each pair of transformations does. The first takes the graph of f and moves it left; then shrinks it: | | | | / | / | / | +---+ | +---+ | +-+ | / | / |/ | / |/ |/ +---------- +---------- +---------- The other shrinks first, and then shifts--but not as far, since the shrink reduced the distance it has to go: | | | | / | / | / | +---+ | +-+ | +-+ | / | / |/ | / | / |/ +---------- +---------- +---------- To put it another way, the shrink in the second version also moved the starting point of the graph I drew (by shrinking the empty space), so I had to shift it less to get to the destination graph.

Let’s look at actual graphs of a specific function. I’ll start with \(f(x) = x^3\) (red), and aim for \(h(x) = (\frac {x}{2}-1)^3\) (blue). The shift-first method is to shift right 1 unit (purple), then stretch by a factor of 2, both numbers being read from the given form:

The stretch-first method is based on rewriting the function as \(h(x) = (\frac {x-2}{2})^3\). We first stretch by a factor of 2, then shift right by 2 units:

Many students find it more natural to do the stretch first, especially when they are reversing the problem, trying to recognize the transformations given the graph (a problem we’ll be looking at next time). When you carry out the stretch in the first form, you must be aware that the stretch starts at the axis, doubling distances from there, rather than from the “center” of the basic graph. As a result, the “center” of the graph ends up moving farther than the shift would suggest (to 2, rather than just to 1).

A second point to make is that theorder of operationsdetermines the order of the transformations. When we wrote the transformed function as f(ax+b), the operations inside the function were done in the order "multiply, then add"; since we are replacing x in each transformation, we ended updoing the last operation first (outside in), replacing x with x+b first, then with ax, so we did the shift followed by the shrink. When we wrote it as f(a(x+b/a)), the order of operations says the addition comes first, then the multiplication, so the transformations have to be done in the opposite order. This applies to transformations of x, on the inside of the function. Transformations like a+bf(x) (verticalshifts and stretches) are done in theSAME order as the order of operations. Also, these do not interact with the horizontal transformations, so it doesn't matter which order you do them in; if you had, say, af(bx) you could do the vertical stretch followed by the horizontal shrink, or vice versa. Try it and see!

We can summarize the order of transformations this way:

I haven’t really discussed reflections yet, but since they also amount to multiplication (by -1), they are done along with the stretch.

In 2007, Elizabeth wrote asking for clarification:

I read your explanation of transformations order of operations, but I guess I'm still unclear on the order that I have to do them in. For example, if I take the equation y = 4 sqrt(2-x), I find that I get the correct graph by doing 1) reflection over y axis 2) horizontal shift of 2 3) vertical stretch of 4 OR 1) vertical stretch 2) reflection 3) horizontal shift. Either way, the horizontal shift has to come after the reflection. It doesn't work if I 1) shift then 2) reflect or stretch 3) reflect or stretch (depending on what I did in step 2). I was always taught to do the horizontal shift first! According to your explanation, as long as what I'm doing yields the correct final equation, I can do whatever order I need to get there. So looking at what didn't work, if I shift first I get sqrt(x-2). Then reflect makes sqrt(-1)(x-2) or in other words sqrt(-x+2) which is the same as sqrt(2-x), then finally vertical stretch yields what I want: y = 4 sqrt(2-x). But this doesn't graph correctly; the reflection must come before I do the horizontal shift. Why?

Her example is a little more complicated because of the additional reflection; it will be a welcome addition here, to make everything more concrete. I replied:

Your question brings up details I didn't emphasize in my original answer. First, I want to point out clearly what each individual transformation looks like, because there are a few points that are easily missed. Here are the transformations mentioned on that page: -f(x) reflection in the x-axis af(x) vertical stretch by factor a f(x)+a vertical shift up by a f(-x) reflection in the y-axis f(ax) horizontal shrink by factor a f(x+a) horizontal shift left by a Note that the first set, the"vertical" transformations, involve changing somethingOUTSIDE the original function; that is, we do something to the "y" that comes out. The second set, the"horizontal" transformations, involve changing something on theINSIDE of the function. This MUST be seen asREPLACING the x itselfwith a negative, multiple, or sum. These are the tricky ones. (They are also the ones that tend to act in the opposite direction to what many people expect, shrinking or shifting left rather than stretching or shifting right.)

Last time, I pointed out the idea of replacing *x*, but it only becomes essential here, with a combination of horizontal transformations. Her example is a perfect one to demonstrate:

Let's take an example of a reflection in the y-axis for a relatively complicated (compound) function. If f(x) = sqrt(x - 2) then reflection in the y-axis involves NOT replacing x-2 with its opposite, 2-x, butreplacing x with its opposite: g(x) = f(-x) = sqrt(-x-2) Try graphing this to see for yourself what happens. A point (x,y) that satisfies y=f(x), reflected in the y-axis to the point (-x,y), will satisfy y=g(x) (using the new value of x). If you graph sqrt(2 - x), you'll see something different.

This, again, is the key point: \(f(-x)\) doesn’t mean “multiply the argument of the square root by -1”; it means “replace *x* with –*x*“.

Again, making one transformation at a time makes this clearer:

Now, let's break your function down into a series of transformations, starting with the basic square root function: f1(x) = sqrt(x) and heading toward our goal, f(x) = 4 sqrt(2 - x) It doesn't matter how the vertical and horizontal transformations are ordered relative to one another, since each group doesn't interact with the other. Let's do all the horizontal transformations first, since they're the awkward ones. We need to do something with the 2, and something with the -x. As mentioned in the page above, it's most convenient (because of the order of operations) to do the shift FIRST: f1(x) = sqrt(x) f2(x) = f1(x + 2) = sqrt(x + 2) shift left by 2 f3(x) = f2(-x) = sqrt(-x + 2) reflect in y-axis Now we have f3(x) = sqrt(2 - x), and we can apply the vertical transformation: f4(x) = 4 f3(x) = 4 sqrt(2 - x) stretch vertically by 4

As before, I’m doing the shift first primarily because of the way the function is written. I could instead have first rewritten it with the function argument in factored form, as I will do later.

In terms of the order of operations, our function looks like this:

So we find that we can shift left by 2, then reflect in the y-axis, and then stretch vertically by 4. And that sequence of transformations works on the graph: * | | * | | * | * * | | | * --> | * --> * | --> * | |* *| |* | *--------- *-+-------- --------+-* *| | |* | | * | --------+-* This is the right graph; for example, f(2) = 4 sqrt(2 - 2) = 0, and f(0) = 4 sqrt(2 - 0) = 4 sqrt(2) = 5.6.

Here is a better graph, showing the transformation from \(f_1(x)=\sqrt{x}\) (red) to \(f_2(x)=\sqrt{x+2}\) (purple) to \(f_3(x)=\sqrt{-x+2}\) (dotted blue) to \(f_4(x)=4\sqrt{2-x}\) (solid blue):

Note how I checked my graph by choosing a point; after drawing the graph, I might observe that (1, 4) is on the final graph, and verify that \(f_4(1) = 4\sqrt{2-1} = 4\) as it should.

I closed by commenting on Elizabeth’s description in her last paragraph:

You seem to have fallen into the trap I mentioned, negating the entire argument of the sqrt, rather than replacing x with -x, and thinking that you have reflected the graph in the y-axis.Be sure to graph the function you got, and convince yourself that it is not the reflection you think it is. The functions sqrt(x-2) and sqrt(2-x) both have x-intercept x=2, so in fact you have reflected around the line x=2! Similarly, I suspect that when you did the shift after the reflection, you were doing some wrong transformations, and not actually graphing what you got and seeing that it really worked. Please let me know if I'm wrong about that.

She responded,

Aha! You are absolutely correct. [of course :) ] I was improperly applying the transformation. Your key phrase "This MUST be seen as REPLACING the x itself with a negative, multiple or sum" cleared it up for me. Thank you for your time and clear explanation!

This 2007 page gives another example:

Does the Order Matter When Transforming a Function?

The next question, from 2017, faces the issue I mentioned about seeing the transformations of the graph incorrectly.

Order of Transformations of a Function, Redux I'm having difficulty interpreting combinations of horizontal shifts, shrinks, and stretches. I understand how they work individually, such as how the scalar in 3x^2 makes the function three times steeper, etc. But how would you solve a combination of them together? I saw a related Dr. Math conversation transform shifts in this order: Start with f(x). Shift b units to the left: f(x + b). Shrink horizontally by a factor of a: f(ax + b). Let's say we have the equation f(x) = (3x - 9)^2. I think the steps of the transformations would be: Start with f(x). Replace x by (x - b): f(x - b). In this case, it would be x - 9. Finally, replace the x in f(x - b) with ax. In this case, it would be 3, which would result in f(3x - 9) = (3x - 9)^2 My reasoning is that it would shift 9 spaces and THEN shrink 3 times its original size. BUT the graph of f(x) = (3x - 9)^2 shows thatit only shifted 3 units to the left, not 9. I am confused why it shifts only 3 spaces. Why not nine? I don't understand how this works!

All of the work here is correct, except for the misunderstanding of what the graph shows. I replied:

You correctly broke down the transformations of the function; but you are not looking at the transformed graph properly. This is a common difficulty (especially in drawing the graph by hand, or finding the shift in a given graph), and for this reason I prefer to change the order -- it makes the graph easier to work with. SoI'll first look at what you are doing, and then do it differently. You know that the graph is shifted 9 units to the right; but this is BEFORE the shrink. When you shrink the resulting graph, you are expecting the graph to still show the same shift. But in fact,the shift itself has been shrunk. What is happening is that every point, including the vertex that you are looking at, is being moved to 1/3 as far from the y-axis as it was; so what WAS a shift by 9 units is now only 3 units.

Ayush is looking at the graph and seeing a shift of only 3 units; but that is the shift you would have done if you shrunk the graph first and then shifted.

Here is the graph of what Ayush has done, showing the intermediate step of shifting right by 9 units (purple) before shrinking (blue):

To see this, consider how each transformation moves the point (0, 0) in the original graph: Start with: (0, 0) Shift 9 units to the right: (9, 0) Shrink horizontally by a factor of 3: (3, 0) So the vertex is only 3 units to the right of where it was, because the shift of 9 was divided by 3.

Of course, just checking a couple points can also confirm that the graph is correct, even if you are unsure of the transformations. But there’s more:

What you are doing, which I find most students naturally do, is to picture the shrinking as centered on the axis of the graph, as a real object would. But the transformation shrinks the distance of every point from the y-axis -- not the distance from the axis of the parabola.When you shrink an already-shifted graph, your mind does not want to see it that way!So, as I mentioned, I like to arrange things so that we do the shrink first, which is easier to see. So consider what happens if werewrite the function. I'll call it g to distinguish it from the original function f(x) = x^2. Factoring out the 3 on the inside, g(x) = (3x - 9)^2 = f(3x - 9) = f(3(x - 3)) = (3(x - 3))^2 Do you see how the 9 became a 3? By changing the order of operations using the parentheses, we have also changed the order of the transformations: Start with: f(x) x^2 (0, 0) Shrink horizontally by 3: f(3x) (3x)^2 (0, 0) Shift 3 units to the right: f(3(x - 3)) (3(x - 3))^2 (3, 0) Here we first replaced x with 3x, which shrinks the graph about the y-axis; then we replaced x with (x - 3), which shifts the graph right. Since the point of reference, the vertex, was still on the y-axis when we did the shrink, it was not affected; and now the effect of the shift is directly visible.

Here is a graph of this process, where we first shrink (purple) and then shift (blue):

It is this shift that people tend to see when they look at a graph, because we think of the graph as getting thinner with respect to itself, not of the whole plane shrinking toward the axis.

]]>Translation means moving an object without rotation, and can be described as “sliding”. In describing transformations of graphs, some textbooks use the formal term “translate”, while others use an informal term like “shift”.

Our first question comes from 1998:

Translating Functions Explain how the following graphs are obtained from the graph of y = f(x): y = f(x - 5) y = -f(x) y = f(5x)

These examples represent the three main transformations: **translation** (shifting), **reflection** (flipping), and **dilation** (stretching). I chose to focus on the first only, suggesting how the student could discover what a transformation does to the graph:

Here's how to think about it. Imagine you have just graphed a point of f(x), say at x = k: | | / + * | / | ----+------+------- | k Nowyou want to graph a point of f(x - 5) using what you just found out. Well, you know f(k). If x = k + 5, f(x-5) = f(k) which you just figured out. So the corresponding point in the new function to plot is x = k + 5, where it will have the same value: | | / / + * o | / / | ----+------+----+-- | k k+5 If you do this with every point in the graph, you will find that the graph of f(x - 5) is justthe graph of f(x) slid to the right by 5. The other two cases are similar. Find what point of the new graph feeds the same value into f as the original, and what y is for the new function. You may have seen this before if you had an equation like: y = k*(x-a)^2 + b and had to find the vertex of the parabola. Notice that this is just: y = f(x-a) + b where: f(x) = k*x^2 whose vertex is at (0,0). So the vertex of the original equation is at (a,b). Knowing how transforming an equation transforms its graph is very useful.

Students often meet the standard form (vertex form) of the parabola before learning about transformations, so my example should be familiar; the vertex is (*a*, *b*) because the basic function is shifted *a* units to the right, and *b* units up. We’ll be seeing more of this soon.

Here is another very similar question from 2001:

Graph with f(x) I am told to sketch the following equations, but do not know how to: y = f(x)+ 2 y = f(x-3) y = 2f(x)

This time we have a **vertical translation**, a **horizontal translation**, and a **vertical dilation**. I chose to illustrate each concept with sample graphs, with only brief explanation of why they do what they do:

Were you told what f(x) is? Without that, you can't really sketch the graph. But if you weregiven just the graph of f(x), say some random squiggly line, you can graph these by seeing how they are related to f(x). | __ | / \f(x) | ___/ \ | / \___ |/ +--------------- g(x) = f(x)+2 is 2 units higher than f(x) for any x; so you justshift the graph upward 2 units: __ | / \f(x)+2 | ___/ \ | / __ \___ |/ / \f(x) ^ | ___/ \ | 2 units | / \___ | |/ +--------------- h(x) = f(x-3) has the same value for a given value of x that f(x) has when x is 3 less. That is, the graph isshifted to the right three units, so that for example h(3) = f(3-3) = f(0). | 3 units | ------> | __ __ | f(x)/ \ / \f(x-3) | ___/ _\_/ \ | / / \___ \___ |/ / +--------------- k(x) = 2f(x) is twice as high as f(x) at any given value of x; so you arestretching the graph vertically: __ | | / \2f(x) | | ___/ \ | __ | / / \ \___ | ___/ \ |// f(x)\___ |/ +---------------

Here are some actual graphs corresponding to the three above:

\(f(x)+2\) [shift up 2]: Here, the point (2, 4) moves to (2,6), adding 2 to the *y*.

\(f(x-3)\) [shift right 3]: Here, the point (2, 4) moves to (5, 4), adding 3 to the *x*.

\(2f(x)\) [stretch vertically by a factor of 2]: Here, the point (2, 4) moves to (2, 8), doubling the *y*.

None of these discussions went deeper into reflections than a brief mention in the first question. I will just add here that you can think of a reflection as a “stretch by a factor of -1”. That is, it just reverses direction. So a vertical reflection (reflection in the *x*-axis) is accomplished by \(-f(x)\), which changes the sign of *y*; and a horizontal reflection (reflection in the *y*-axis) is accomplished by \(f(-x)\), which changes the sign of *x*. These generally don’t give students any trouble, apart from remembering which is which. We’ll be seeing them again next time, because they do give trouble when we combine transformations.

The horizontal transformations, involving *x*, confuse many students. Here is a question from 2002 about just that:

Shifting Graphs Here's a passage I don't understand. "If g(x)=f(x-c), where c>0 then the value of g at x is the same as the value of f at x-c (c units to the left of x). Therefore, the graph of y=f(x-c) is just the graph of y=f(x) shifted c units to the right." I don't understand why it shifts to the right.

I referred to the last answer, and gave a little more detail:

Suppose you have just plotted a point (a,b) on the graph of y=f(x), and now you want to plot the same point on the graph of y=g(x). You have already calculated f(a) and got b, so you would like to reuse that information and save work. Since g(x) = f(x-c), you know that g(a+c) = f(a+c-c) = f(a) = b. So the point (a+c, b) will be on the new graph. This point is your original point (a,b)shifted c units to the right. The same will be true for every point on the graph of y=f(x), so the whole curve will be shifted c units to the right to make the graph of g.

We have to add *c* to *x* to compensate for the fact that it will be decreased by *c* before being fed into function *f*. So replacing \(x\) with \((x-c)\) in the function moves every point to the right by *c* units.

In general, everything we do with *x* will be the opposite of what you might expect, for this same reason. This is true not only of horizontal shifts, but of horizontal stretching as well, which we haven’t seen yet. Here is a question specifically about that issue, from 2004:

Dilations of the Graph of y = f(x) Why is it that when doing ahorizontal shrink or stretchyou multiply by the reciprocal but when doing a vertical stretch or shrink you multiply by just the number? For example, to stretch y = f(x) vertically by a factor of 2 we just use y = 2*f(x), but to stretch it horizontally by a factor of 2 we use y = f(x/2). Why isn't it y = f(2x)?

Here is an example, based on our function above:

The dotted graph is *f*(2*x*), **compressed** (shrunk) by a factor of 1/2 horizontally; the point (2, 4) moves to (1, 4), halving the value of *x*.

The dashed graph is *f*(*x*/2), **stretched** by a factor of 2 horizontally; the point (2, 4) moves to (4, 4), doubling *x*.

I first looked at the more natural vertical transformations from a new perspective:

There's a different way to express the dilations (stretching and shrinking) that makes the two look more similar. As in your example, say we have an equation y = f(x) If youstretch the graph vertically by a factor of k, the new y will be k times what it was for a given x: y = k * f(x) But you can also think of it asreplacing y in the original equation with y/k: y/k = f(x) That means that y for the new equation has to be k times as great as in the original equation, in order to be on the graph; dividing it by k gives you the equivalent "y" in the original equation, which is why we replace y with y/k to get the new equation.

The usual perspective multiplies the function value, obviously stretching it; here we are instead **replacing** *y* with *y*/*k*, forcing *y* to be *k* times as large.

Now let's stretch horizontally by a factor of k. That can be done byreplacing x with x/kthis time: y = f(x/k) When thought of in that way, the two approaches are the same--only the variable replaced changes depending on if you dilate vertically or horizontally.

We can do the same with translations, providing a different way to see the “oppositeness” of the horizontal stretch:

As an aside, a similar discussion holds for simple translations. The usual approach uses the notation: y = f(x - h) + k To move the graph 3 units right and 2 units up, we would have: y = f(x - 3) + 2 Again, students wonder why the upward shift of 2 is done with +2 while the shift to the right is done with -3. That doesn't seem consistent, either. But if we do the same thing and rewrite the equation as y - k = f(x - h) we can see that the two translations are in fact accomplished the same way.

This last form may look familiar to you; it is the same form as the “point-slope” form of a line.

In this view, any transformation is accomplished by replacing a variable with a new expression that does the opposite of the transformation.

This general issue came up again in 2013:

Functional Transformations Explained and Undone Why do horizontal transformations behave in an opposite manner? For example, why does f(x + c) shift a graph c units to the left? Likewise, to shrink or stretch horizontally, i.e., f(cx), why do you divide the x coordinate by c?

I explained in terms of undoing:

Here's the idea. Suppose, for example, that the point (5, 6) is in the graph of y = f(x). This means that f(5) = 6. Now, consider the graph of g(x) = f(x + 4). Since we know that f(5) = 6, we know that g(1) = f(1 + 4) = 6. So the point (1, 6) is in the graph of g. This point is 4 units to the left of (5, 6). The same will be true of any point on the graph of g. Do you see what happened? In order to find the input to g that will correspond to 5 being the input to f, I had to SUBTRACT 4 from x -- that is, shift the point LEFT. In other words, when you replace x with x + c in a function, x must be c LESS that what it was in the original function in order to get the same output. Similarly, when you replace x with cx, x must be DIVIDED by c in order to get the same output.

In 2014, we got a very different question, asking about the terminology of stretches:

Stretching Definitions, and Compressing The original function is y = f(x). Given a new function y = f(cx). Case (i) If 0 < c < 1, the graph isstretched horizontallyby a factor of c unitsCase (ii) If c > 1, the graph iscompressed horizontallyby a factor of c unitsFor case (i),why is the stretch factor not equal to 1/c?For case (ii), is the stretch factor c or 1/c? Do we always measure "stretch" in c units? and compress in 1/c units? I am really confused by this.

If you don’t see what Jason is concerned about, notice that for \(f\left(\frac{1}{2} x\right)\), his book calls it a stretch by a factor of \(\frac{1}{2}\), where I would call it 2; and it calls *f*(2*x*) a compression by a factor of 2, which could also be called a factor of \(\frac{1}{2}\) (since that is what coordinates are multiplied by). Is the book wrong?

I had observed that this terminology varies, and took the opportunity to research it a bit, breaking down what I found into cases:

I've seen this taught several ways -- which should tell you that it's really not important. I(a). Some textsalways use a factor greater than 1; they say that f(2x) is compressed by a factor of 2 (meaning x is divided by 2) f(x/2) is stretched by a factor of 2 (meaning x is multiplied by 2) In general,f(cx) is stretched by a factor of 1/cif 0 < c < 1,and compressed by a factor of cif c > 1. Here the "factor" cited is always greater than 1. For example, see http://blogformathematics.blogspot.com/2012/12/transformations-of- functions.html I(b). An alternative presentation of the rule is: In general, taking c > 1,f(cx) is compressed by a factor of c. andf(x/c) is stretched by a factor of c. See, for example, http://archives.math.utk.edu/visual.calculus/0/functions.12/ II(a). Others say that f(2x) is compressed by a factor of 1/2 (meaning x is multiplied by 1/2) f(x/2) is stretched by a factor of 2 (meaning x is multiplied by 2) In general,f(cx) is stretched by a factor of 1/cif 0 < c < 1,and compressed by a factor of 1/cif c > 1. That is, x coordinates are multiplied by a factor of 1/c, which is a stretch if 1/c > 1 and a compression if 0 < 1/c < 1. For an example, see http://math.kennesaw.edu/~sellerme/sfehtml/classes/math1113/ transformation.pdf II(b). The following site seems to follow this approach, but in the horizontal case, they say the factor is c, not 1/c. They give no examples, so I don't know if they realize that they are saying f(x/2) is"stretched by a factor of 1/2,"which makes no sense! http://www.regentsprep.org/Regents/math/algtrig/ATP9/funclesson1.htm III. Yet others seem to avoid the terminology of "by a factor of c," and just say that f(cx)divides x by c, which is astretchif 0 < c < 1 and acompressionif c > 1.

The link under II(b) is broken; here is an image of what it said about this:

So, we find “compressed by a factor of 2” (I), and “compressed by a factor of 1/2” (II), meaning the same thing. As long as we focus more on the word than the number, we’re okay:

The first view (I) takes the word "stretch" or "compress" as telling you whether to multiply or divide, and makes thefactor always greater than 1. The second view (II) takes the operation to bealways multiplication, with stretching implying a factor greater than 1 and compressing implying a factor less than 1. I tend to prefer II(a), because "factor" implies multiplication, not division, and because it seems more natural to talk about compression as being by 1/2 rather than by 2. But I can appreciate the reasons for the other view, and I don't mark students wrong for expressing it that way. As long as they use the word "stretch" or "compress," I know what they mean;"compress by a factor of 2" can't mean multiplication by 2, so it must mean division by 2. The rules you quoted are the II(b) version, which as I noted is strange, and certainly not standard. I would not accept that usage from my students, because saying that the graph is stretched by a factor of 1/2 is incomprehensible, andI would not be sure they know what they are doing. You may want to adopt version III for your own thinking, as a way to avoid trouble in the future. But be aware that you will find all sorts of terminology in other sources, including future classes you take. Ultimately, you should do whatever your textbook and teacher say ... ... and hope that they agree: if not, the teacher must be aware of the issue and will either tell you not to worry about it, or make it clear what he expects.

Jason replied,

Dr. Peterson, thank you for your detailed explanation and the websites. I agree that II(a) makes more sense, and now have a better understanding of the concept.]]>

To see something of what our goals are, I thought I’d share part of a **Guide to Writing Responses**, which I believe was first written around the time in 1998 when many of us were invited to join.

After a brief discussion of technical details (how to type math with the primitive facilities of the day, and how to do the writing), there is a list of 12 principles for writing an answer. I’ll quote them, and add a few comments.

1. Make sure your math is correct!Answer only questions for which you are absolutely sure that you understand the mathematical concepts that motivate the question. In addition, even when you are very sure that you understand the math behind a question, it is important to be extremely careful when responding to questions. Many math doctors have made errors on simple problems. If you are careful to proofread, you can avoid this!

Correctness is goal #1, and always has been.

One of the benefits of being a Math Doctor is having other Math Doctors around you. Each of us can specialize in what we know best, knowing that someone else (probably) can pick up the questions for which we don’t have solid knowledge.

But, yes, we have made mistakes, and though published answers were edited, some of the errors have remained for years until someone wrote to us about them. We have always been grateful for such corrections. We can’t make corrections in the legacy archive, but you can still write here when you find an error, and we can post an errata page.

2. Focus on helping students learn to think in the creative ways that foster a deep understanding of the mathematical concepts lying behind problems.After having worked with us, we would like students to be able to solve not only the problems that they sent us, but also problems that require similar kinds of thinking.

Make sure that your answer gets at the math that underlies the problem. Avoid giving algorithmic solutions to problems; rather, help the student understand the mathematics that motivates the question. Furthermore, write your answer in such a way that the mathematical ideas and thinking involved in solving the problem are highlighted; after all, it is these ideas and ways of thinking that make math the subject that it is. When you do these things, the student will be able to take his or her newly gained knowledge and apply it to other problems that require the same kinds of thinking; he or she will have truly learned from your response.

Goal #2 is deep understanding, which is far more important than giving answers.

Learning has not really occurred until the student understands the concepts, and can creatively apply them to different problems. So, whenever possible, we want to go deeper than the specific problem that was asked about.

3. Write as clearly as you can.Remember that many of the students who write to us are struggling with math. Because of this, you should write as clearly as you can when you are writing about math. If the students have to struggle with something, we want them to struggle with mathematical concepts, not with what we mean when we say something.

Goal #3 is communication. Math is hard enough; we don’t want to complicate things.

Many technically-minded people are not comfortable with writing, which is one reason for the training process, to weed out people who aren’t consistently able to do this well. All of us, I’m sure, have written less clearly than we would have liked to, but this is a goal.

4. Explain not only how to do problems, but also why you do them the way you do.We want students to learn about math and become better mathematicians as a result of having asked us a question. As we all know, you cannot be a good mathematician unless you know exactly why you do what you do to solve problems; knowing only the steps one must take to solve a problem without understanding why one must undertake them will only get you so far.

This is another aspect of #2; understanding involves not only knowing what to do, but why. Math is not done by memorization, but by logical reasoning, which includes identifying when a procedure is applicable, and when something else needs to be done.

5. Let students have a role in coming up with solutions to their problems.When students ask you how to solve specific problems, help them by setting up a solid foundation from which they can work to solve the problems on their own. It can be difficult to find the right balance between explaining concepts necessary to do the problem and letting students figure out at least part of their problems on their own, but as you work on it, you will get better at figuring out how much to leave to the student.

Interaction is an important part of teaching math; as we say, math is not a spectator sport. We have to get the student to do the work, not just watch us do it.

In those early years there was no efficient way to respond to us (messages came in by email, and were not put into threads), so there were not many extended conversations; but when possible we would try to stop short of giving an actual answer, so that the student would be able to apply his or her own mind to finishing the problem. Sometimes we would ask the student to get back to us, but they never did.

Sometimes we just explain the concepts or give a similar example, leaving the original problem for the student to do; or we might just do the first part or give a hint. How to do this varies from one problem type to another, and sometimes you can’t get around demonstrating all the work.

6. Read questions carefully so that you can tailor your responses to the students’ needs.Students sometimes tell us what they have tried to do to solve problems, providing information about how far they have been able to get. Use this information to write your answer. We want to address their attempts at solutions either by explaining why what they did was correct, or by showing them where they went wrong.

Ideally, we want to see the student’s thinking, in order to have a better idea what knowledge to assume. Knowing their context (grade level, course, etc.) can also be important. Once it became easier to dialogue, we would more often just ask for such information before giving more than just an initial hint; that is how we often operate now. Sometimes when they respond by showing their work, we discover that their difficulty is in an entirely different area than we expected; and we are able to commend them for what they got right while helping with the error or getting them past a point where they were stuck.

When there is work to evaluate, it is important that we both help them to see their errors, and also give positive help toward a correct answer. Correcting a misunderstanding is more important than merely providing a correct explanation.

7. Look out for questions that have multiple interpretations; when you find them, be sure to make explicit the fact that the problem can be interpreted in different ways.It is easy to read a question quickly and think that you know what a student is asking; however, always read through questions carefully to make sure that there is only one reasonable interpretation of the problem. If you see that a problem may be interpreted in more than one way, address this in your response to the student. Depending on the question, you will want to respond in a number of ways.

If it seems easy to address all of the possible interpretations, you may do so after you have explained each of them thoroughly. However, it may take too much time to go through each interpretation and one may seem more likely than the rest. You might want first to acknowledge the fact that there are multiple interpretations and then to address one of them. Of course, in this case you should be sure to encourage the student to write back if he or she wanted us to respond to a different question.

Sometimes the number of interpretations is so large and the meaning of a problem is so unclear that you will want to wait to answer the question until you know which interpretation was intended by the student. In this case, it is fine to e-mail the student asking for clarification. To help the student, however, you should include some of the possible interpretations you have come up with on your own. This will help students see where wordings are ambiguous and should thus help the student to reword problems to avoid ambiguity.

I often give similar advice to students when a problem is ambiguous: State your interpretation of the problem before giving your answer, so that it will be clear that you are giving a valid answer to the problem as you perceive it, even if it is wrong according the problem’s intent. By showing possible interpretations of a student’s question, choosing one, and giving appropriate help, we are maximizing the chances of being useful.

8. When possible and appropriate, use students’ questions as a jumping-off point for discussions of related math topics.If a question clearly leads into a mathematical concept/idea that you find to be interesting, tell the student about it! Even better, ask the student some extension questions that may lead to discovering or thinking about these ideas.

We want to encourage curiosity, because investigating one’s own ideas is great for motivation and for deeper learning. Demonstrating our own curiosity about their topic, or interest in their ideas, can be a way to do that.

9. Write using a tone that is friendly and conversational.Responses written in a nice and easy tone are more enjoyable to read, and they make math less scary for those students for whom math is difficult.

One of the purposes of the service from the start was to help students become comfortable writing about math. Anything that enhances comfort is good. So one of our goals is to be different from textbooks, both in style and in substance.

10. Be careful with technical language; if you think that there is any chance that the student to whom you are writing will not know a word that you plan to use in your response, either do not use the word or clearly define that word.Many of the kids who write us struggle with the language used by mathematicians, so make sure that everything you write will be understandable to them. You will probably have to change your vocabulary somewhat so that you use less math jargon. Of course, you shouldn’t avoid technical language altogether; after all, students need to learn math vocabulary words. However, when you do use technical language, provide understandable definitions of words used so that students who don’t know the required vocabulary can follow your answer.

This is another important balance, between teaching important words, and communicating clearly to those who don’t yet know them. It’s also another reason we like to have some background information on a student, to give us a better idea what words and ideas to assume they know.

11. Be supportive and encouraging.Whenever students get something right, compliment them! Many people’s math problems have more to do with confidence than anything else and a little boost in confidence can go a long way, so be positive. Look at some very encouraging messages.

This is something I try to be particularly careful of in my face-to-face tutoring as much as online. (Actually, all of these principles have defined my interactions there.)

We commonly start off by pointing out the good work a student has done in the work he shows, or even in the phrasing of a question.

Finally, the last and perhaps most important principle:

12. Have fun, and be creative!If you are having fun writing a response, chances are the student reading your response will also enjoy it. Try out new things with your responses – make some jokes (if that is in your nature) and just have a good time!

Enjoying math can be infectious. (Well, some people seem to be immune, but we can try to expose them.)

We still try to do the same things in our interactions today, and I think these points are useful for anyone doing tutoring, online or off.

For most of these points, several examples were provided in the original document.

Looking for newer examples, it occurred to me to search instead for students’ thanks, to see when they have recognized these qualities in our help. Here are some, listed by principle:

**#2: Focus on understanding**

The Riemann Zeta Function: Extended Confusion about an Analytic Continuation Now I understand. The beauty of mathematics could be appreciated only when it isunderstoodproperly.

**#3 Clear communication**

Why Does Height Formula Use -16 Instead of -32? Fuzzily gettingclearer(a lot) - thanks for getting down to my level.

**#4: Explain why**

Proof of Derivative for Function f(x) = ax^n Thank you very much! This is very cool, andnow I understand whythe 'quick method' actually works. Thanks for taking the time to answer my question.

**#5: Let them do the work**

One Variable in Two Radicals I was going to follow up with one final question for you -- about why the quadratic formula introduces some roots that are extraneous -- and you answered my question before I even asked it! WOW! Thank you for explaining that, and also fornot just giving me the answerbut for offering a suggestion and letting me work through it myself. You are also a good cheerleader for students. Awesome instructor, Dr. Peterson!

Adding a 6-Inch Layer of Gravel Thank you very much for taking the time to break my problem down into an easier way of managing it. It helped me out a great deal.I understood the problem and was able to figure it outwith starting with the easier problem first. Thank you for your time. Have a great day!

**#6: Tailor to students’ needs**

Isolating a Variable in a Tricky Place, or Raised to a Power Thank you so much, that isexactly what I needed to know. You explained it perfectly and I actually understand it now.

**#8: Ask extension questions**

Ogive, More or Less A good teacher teaches, whereas a great teacher inspires; and Dr. Math,you have inspired me to learn more. Specifically, I'm going to learn about discrete and continuous probability distributions very soon. :> But I've learnt a lot already. Thanks!

**#9: Be friendly**

Hockey League Tournament Schedule I really just want to say I can't thank you enough for your time and effort and the very speed with which you responded again. It is so, so gratifying I'm genuinely moved. You don't know me from Adam, as we say - which means you have no reason to be so helpful and so generous, and there's nothing in it for you. These are controversial times when the Internet is creating so much adverse feeling. The privacies and sensitivities of people are so often abused by mischievous and sometimes evil people, hiding behind the anonymity of the computer.To be treated in so friendly a mannerand to be given so much assistance on what many people would dismiss as a trivial matter, is positively uplifting.

**#10: Define words**

How Can the Product of Two Radicals Be Finite? Thank you for taking the time to answer my question so thoroughly. I think I need to eliminate "infinite" from my vocabularyso I don't get confused again! You provided several explanations and scenarios andyou MADE it make sense. I really appreciate it and it definitely helped! You're great :)

**#11: Encourage**

Accuracy in Measurement Thanks for your reply. It really helped me get over this paradox, andmade me feel much more assured towards math. Thanks!!!

**#12: Have fun**

Can Rewriting P -> Q as ~Q -> ~P Lead to a False Conclusion? You're so nice. We've really learned some new things, and one of usstarted to get interested in mathematics again.

Finally:

Is There a "Discriminant" for a Quartic Equation ... in Closed Form? I love Dr. Math. I love what it does. I love how valuable it is -- in my opinion, more than gold. Because while gold can wrap around someone's finger or neck and gleam, Dr. Math helps to improve the mind and mathematical abilities, which I see as priceless, especially when they are as polished as the ones you have. You have shared your knowledge with me; and for that ... and for your input ... and for the extensiveness of your responses ... I am extremely grateful!]]>

In the fall of 1994 the Math Forum at Swarthmore College (then the Geometry Forum) started an email program called “Ask Dr. Math”.

The idea was to answer any and all math questions from K-12 students as fast and as well as we could.We began with an email address and a dozen math students who signed on to answer questions in shifts around the clock.The project worked. In fact, it worked better than we imagined it would! Instead of ten questions a week, we now get over forty questions a day. We have answered thousands of questions, and more keep coming. The volume of questions has grown so large that even now that we have more doctors we still cannot keep up. Therefore, we are inviting qualified undergrads, retired teachers and practicing mathematicians to help us turn Dr. Math into one of the strongest mathematical resources on the internet.

As a Math Doctor you will have the opportunity to answer the math questions of K-12 students from all over the world. Questions may range from “How do you add really big numbers” to problems involving sphere packing and complex analysis.

You will be able to answer some questions in five minutes. Some might take an hour. Some you will never solve. Some have never been solved by anyone. All will make you a better teacher and student of mathematics.You will also be reminded of the reasons why you got into mathematics in the first place: because when you saw the Pythagorean Theorem it was beautiful, and when you saw the Fundamental Theorem of Calculus, you thought it was fun. You’ll have not only the joy of recognizing that you can help those who are where you were 5, 10 or 30 years ago, but also the opportunity to help give them a love for mathematics.

All it takes to be a Math Doctor is a knowledge and love of mathematics, a knack for explaining mathematics, and access to a good browserlike Netscape 3.0 or Microsoft Internet Explorer 2.1. If you meet these criteria we’d love to have you join us.

That is why I joined, and why I can’t stop!

Much has changed since then: The number of questions became hundreds per day, then gradually fell off again as the Internet became bigger and more options became available; but we remained special, not in our technology but in our people and our tradition. We got questions far beyond the original level (though the focus is still on K-12 or early college material), and the archive became huge, making us the great resource that had been hoped for. (We received almost a million questions over 23 years, a small percentage of which were published in the archive.) It is still fun to answer a wide variety of questions, and to converse both with students who have the drive to understand beyond what they are being taught, and with students or adults who need support and encouragement to learn the basics.

A year ago, having been cut loose from the original organization, we built this new little site in order to continue answering questions and drawing attention to our old answers. We hope that as more people discover us, we can continue meeting these needs into the future.

Next time, I’ll continue the first-anniversary focus on the past by showing some of the training materials that were used to define what “a knack for explaining mathematics” means. We care about doing it right!

]]>Here is the initial question:

Hi,

I am trying to calculate the domain and range of this function f(x)= (x^2 – 3x + 2)/(x^2 + x – 6).

I am uploading a rough image of my attempt.

The answer for range which I am getting is [1/5, infinity) but the correct answer given in my textbook is R-{1/5, 1}.

Please help me correct my answer. Where am I doing the mistake?

I will be thankful for help!

His notation for the correct answer means “all real numbers except 1/5 and 1”. His own answer is very different: all real numbers starting at 1/5.

First, let’s examine what he has done. He wants to find the range of $$f(x) = \frac{x^2 – 3x + 2}{x^2 + x – 6},$$ so he starts to solve the equation $$\frac{x^2 – 3x + 2}{x^2 + x – 6} = y,$$ by multiplying by the denominator and simplifying. The result is the quadratic equation $$(y-1)x^2 + (3+y)x – (6y + 2) = 0.$$ Solving this for *x* would give the inverse relation (*x* as a function of *y*); this technique is demonstrated in the previous post I mentioned above. We need to find the *domain* of this inverse relation, which will be the *range* of the original function. (That is, we’re finding the values of *y* for which an *x* exists.) In order to do that, we find the discriminant \(D = b^2 – 4ac\); the range will be the set of values of *y* for which this is non-negative, so that there is at least one real value of *x* associated with that *y*. (As we’ll see, a little more than that will be needed.)

The result is an inequality that simplifies to $$25y^2 – 10y + 1 \gt 0.$$ But then he solves this inequality incorrectly …

I replied, pointing out this error (but making one of my own) and suggesting another way, by which I had obtained the correct answer:

One specific error in your work is the step from (y – 1/5)

^{2}≥ 0 to y ≥ 1/5. Polynomial inequalities don’t work that way; the solution in this case is y ≠ 1/5, because the square is positive unless the base is 0.I solved this a different way; trying it your way led to some subtle steps I almost missed. I did get the correct answer both ways.

I first simplified the function by canceling (x – 2). Then I found the inverse and determined its domain (which could be done directly due to the simplification). I also had to eliminate the value of y corresponding to the “hole” in the original function.

My method started by factoring the numerator and denominator and simplifying:

$$f(x) = \frac{x^2 – 3x + 2}{x^2 + x – 6}$$

$$f(x) = \frac{(x-2)(x-1)}{(x-2)(x+3)}$$

$$f(x) = \frac{x-1}{x+3},\ x\ne 2$$

This is equivalent to the original, as long as we take note that 2 is not in the domain. (That is the “hole”.)

Solving for *x*, we have

$$\frac{x-1}{x+3} = y$$

$$x-1 = y(x+3)$$

$$x – 1 = yx + 3y$$

$$x(1 – y) = 3y + 1$$

$$x = \frac{3y + 1}{1 – y}$$

This is defined for all *y* except 1. But we also have to exclude the value of *y* for which \(x = 2\) in this simplified equation, which is $$y = \frac{2-1}{2+3} = \frac{1}{5}.$$

The range is therefore, as the book said, all real numbers except 1 and 1/5.

But I had mentioned further subtleties if we don’t take this way, and the student chose to persevere in order to learn how to handle obstacles:

Thank You Doctor Peterson for help!

I am trying to proceed by my own method.

So, you are saying that if (y – 1/5)

^{2}≥ 0 then it can’t be y ≥ 1/5, and y ≠ 1/5.But, there is an equality symbol as well.

So, I think y = 1/5 is true. Why do you think it is false?

So, once again the range which I am getting is from 1/5 (included) to infinity.

But, the correct range given in my textbook is All Real except 1/5 and 1.

Please help me correct my answer.

I will be thankful for help.

His correction of my correction is correct! I had treated the inequality as “>” instead of “≥”. But that only deepens the mystery: How is 1/5 excluded from the domain of the inverse? I gave some hints for extending his method to find the correct answer:

I did miss the equality part (probably because of the correct solution you’d mentioned); but I hope you see that the solution of (y – 1/5)

^{2}≥ 0 is not x ≥ 1/5, butall real numbers! A square isnevernegative.So

what is special about 1/5that would exclude it from the domain? The discriminant there is 0, so there is only one solution for x; what is that one solution? Find out, and then look back at the original equation!This is one of the subtleties I mentioned in your method; but it also occurred in mine, and I mentioned it in my summary of what I did.

Now,

what about the exclusion of 1?That’s the other major subtlety: Look at your quadratic equation in x; is it always quadratic? If it isn’t, then the discriminant isn’t really relevant; you have to take this as a special case.

So far, his method has not yielded any numbers to be excluded; they will both be special cases, somehow. So figuring this out will be useful for understanding the whole process.

He replied:

Ok so I plugged y = 1/5 in the equation and got x = 2, which is of course rejected because x ≠ 2.

So, my final question is that how would someone decide which values to plug in the equation to check whether it is possible or not? (For example, how would someone decide that he/she need to check for 1/5 and not for any other number ?)

This is actually the easier of the two exclusions to deal with, but it turns out to be not really about checking *y* = 1/5, but rather checking *x* = 2:

I suppose the fact that this is a special point (where the discriminant is 0) suggests checking; but, really, from the start you should plan to check the value of y when x=2 (in the original equation, after removing the “hole”), and exclude that,

as long asit isn’t associated with any other value of x. (That’s another subtle point.) So dealing with that point is really separate from the work you did.Ultimately, I would want to sketch the graph along with any algebraic methods I use, to make sure I haven’t missed anything important (like that hole).

So, the correct way to find this really has nothing to do with finding 1/5 in the course of his work, but merely with his failure to even consider the hole! Because he never factored the numerator and denominator, he was unaware of the need to exclude the *y* associated with the hole.

But this does leave a curiosity: How did his method end up coming so close, with 1/5 as a special point at all?

Let’s look at his work, but with the numerator and denominator written in factored form:

$$\frac{(x-2)(x-1)}{(x-2)(x+3)} = y\ \Rightarrow\ (x-2)(x-1) = (x-2)(x+3)y.$$

We have the factor \((x-2)\) on both sides, so that when \(x = 2\), the equation is true regardless of the value of *y*. In fact, we could therefore rewrite that equation as $$(x-2)\left((y-1)x + (3y+1)\right) = 0,$$ whose solutions are \(x = 2\) and \(\displaystyle x = \frac{3y+1}{1-y}\). So when the discriminant is 0 (so that there is one solution), these are equal, and \(\displaystyle\frac{3y+1}{1-y} = 2\). This is, of course, when \(y = 1/5\)! So it just happens (!) that the check for the discriminant did locate the hole, and there really was reason to check that point. I doubt I would ever have guessed that.

Now let’s finally look at a graph of this function:

We can see that the domain excludes -3 (the vertical asymptote) and 2 (the hole); and the range excludes 1 (the horizontal asymptote) and 1/5 (the hole).

(By the way, did you wonder at some point why the discriminant was almost always positive, indicating that there were *two* values of *x* for a given value of *y*? Did you assume that therefore the function must not be one-to-one? It’s because the equation we were working with there consisted of this graph together with the vertical line *x* = 2.)

There’s still one thing missing, though: How would the student’s method have found that *y* can’t be 1 (the asymptote)? This is my second hint, which I haven’t yet elaborated. Look back at the student’s quadratic equation: $$(y-1)x^2 + (3+y)x – (6y + 2) = 0.$$ The discriminant only applies to quadratic equations; and this is *not* a quadratic equation when *y* = 1! So we should have checked that case before finding the discriminant. And this amounts to finding the horizontal asymptote. In my method, I actually found the inverse, so this asymptote was directly visible. In the method we are examining, we looked only at whether an inverse exists, and never actually saw it, so as to discover these details.

What’s the bottom line of all this? Don’t skip the factoring and simplifying when you work with a rational function!

]]>Last time we looked at a formula for approximating the **mode** of grouped data, which works well for normal distributions, though I have never seen an actual proof, or a statement of conditions under which it is appropriate. We have also received questions about a much more well-known, and well-founded, formula to estimate the **median**. Here, it is possible to give a solid derivation, and to clearly state the assumptions on which it is based.

Here is the initial question, from 2007:

Derivation of Linear Interpolation Median Formula Median, m = L + [ (N/2 – F) / f ]C. How does this median formula come? My teacher did not show and proof how does this formula come. Therefore, I just substitute and blindly use the formula. Can you help me? This formula is used to find the median in a group data with class interval. The median is the value of the data in the middle position of the set when the data is arranged in numerical order. The class where the middle position is located is called the median class and this is also the class where the median is located. This formula is used to find the median in a group data which is located in the median class. Median, m = L + [ (N/2 – F) / f ]C L means lower boundary of the median class N means sum of frequencies F means cumulative frequency before the median class. Meaning that the class before the median class what is the frequency f means frequency of the median class C means the size of the median class I have tried to use an ogive graph to understand, but I still did not get how did this formula come.

Daya recognized that the formula is related to the ogive (also called the Cumulative Distribution Function, or CDF), but wasn’t able to complete the derivation. The formula is, again, $$m = L + \left( \frac{\frac{N}{2} – F}{f}\right)C.$$ For a well explained source, see

Math is Fun: Mean, Median and Mode from Grouped Frequencies,

which I referred to last time; this says, under *Estimating the Median from Grouped Data*,

Estimated Median= \(L + \frac{(n/2) − B}{G} × w\)

where:

Lis the lower class boundary of the group containing the mediannis the total number of valuesBis the cumulative frequency of the groups before the median groupGis the frequency of the median groupwis the group width

I answered with a statement of what the formula does, and a quick derivation:

This is a linear interpolation (on the ogive graph, as you suggested), which findswhere the actual median WOULD be if you assume that the data are uniformly distributed within the median class. One way to derive the formula is just to note that N/2 is the number of data values BELOW the median, so N/2 - F is the number of data values in the median class that are below the median. Therefore, (N/2 - F)/f is the fraction of values in the median class that are below the median. This times C is that fraction of the class width; adding L gives the value at that position in the class. In terms of the ogive (cumulative distribution), let's first just plot the actual cumulative frequency before each class, something like N+ * | * | * | | * --- + . . . . . . ^ | |f | v F| * --- | * *----+----+----+----+----+----+ L |<-->| C We don't know where the actual data points are, but if they are uniformly distributed within each class, we could connect the points above with straight lines. Your formula gives the x coordinate corresponding to y=N/2. See if you can derive it this way.

Our formula gives the x-coordinate of the point on the graph where y = N/2. Here is a better version of the graph:

In 2016, another student, Pramod, asked about the same formula, giving his own derivation that led to a slightly different formula:

Given this frequency distribution table: 60-70 4 70-80 5 80-90 6 90-100 7 ------------------ n = 22 I used the following rationale to calculate the median. Median data entry = (22 + 1)/2 = 11.5th entry from first = 11.5 - 9 = 2.5th of 6 entries through 80-90 Now, since I don't know the 6 data entries of median class, I assumed that they were distributed equally through 80 to 90 (10 class width): 81.667, 83.333, 85, 86.667, 88.333, 90 I used these in the formula Median = L + {(n + 1)/2) - c.f.} * (h/f) Here, L = lower limit of median class h = class width c.f. = cumulative frequency up to the preceding class f = frequency of median class n = total data entries/summation of frequencies I got Median = 2.5th data entry = (83.333 + 85)/2 = 84.1667 But in almost every statistics book I have ever studied, the formula for calculating median from a continuous frequency distribution table is given as Median = L + {(n)/2) - c.f.} * (h/f) I know very well that the median calculated from such data is not exact, since we know only the range of data entries -- not the actual data entries, themselves. But still, does't it make more sense to use my formula? Doesn't it give a more precise approximation? If you agree, why is the latter formula used in almost every textbook?

This was an excellent attempt, and just missed two details. I responded by first referring to the answer above (to which this question was later attached):

I discussed this formula for Daya, above, but I didn't go into the details of the derivation to confirm that that formula could not be improved upon. I have a small problem with your example:you didn't clearly state how to interpret your classes. Let's take a closer look at your data. class freq ----- ---- 60-70 4 70-80 5 80-90 6 90-100 7 ---- n = 22 Which class is 70 in? I will assume that 80-90 means 80 <= x < 90, as is commonly done for continuous data; if the values are integers, then the class could also be described as 80-89 (inclusive), but then our estimate would have to be rounded to an integer, so we would not get a similar formula.

When classes are described in terms of integer values, the lowest and highest values in a class are called the class limits. But in a formula such as this, we need to treat the data as continuous, so we use, not these class limits, but the class boundaries, which are real numbers halfway between classes. Here, the lower boundary of the median class would be 79.5, which is 0.5 below the lower limit, 80. (Note that the word boundary is used in both statements of the formula above.)

I didn’t take this distinction into account in my answer to Pramod; and his work suggests that he is in fact assuming continuous (real number) data.

The principal error in Pramod’s derivation was including the lower limit (or boundary) of the next class in the median class:

If the 6 values in the class 80 <= x < 90 are evenly spaced across these 10 units, then they are spaced 10/6 = 1 2/3 units apart. I would center them like this: 5/6 __5/3__ __5/3__ __5/3__ __5/3__ __5/3__ 5/6 / \ / \ / \ / \ / \ / \ / \ * * | * * * * +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ 80 81 82 83 84 85 86 87 88 89 90 Therefore, the 2.5th value is 83 1/3 -- that is, 80 + 2*5/3, not 80 + 2.5*5/3. The standard formula gives Median = 80 + [(22/2) - 9] * (10/6) = 80 + 2*5/3 = 83 1/3 This agrees with my answer.

If I had used the class boundary assuming integer values, the median would be $$m = L + \left( \frac{\frac{N}{2} – F}{f}\right)C = 79.5 + \left( \frac{\frac{22}{2} – 9}{6}\right)\cdot 10 = 82 \frac{5}{6}.$$ Everything in my line graph below would be shifted left by 1/2.

In the page above, the implication is that we would use the continuous CDF (for your example) like this: n=22| * + / | / | / | * --- | / ^ 11| . . . . . . |f=6 + / v F=9| * --- | / | / | * | / 0*----+----+----+----+ 60 70 80 90 100 |<-->| C=10 Linear interpolation puts the median 2/6 of the way from 80 to 90, giving 83 1/3 again.

Here is a more accurate graph:

The graph vindicates the formula.

The difference between my first approach and yours is that I was a little more careful to distribute the values uniformly within the entire interval; whereas your last value is right at the end of the interval (and, I think, really in the next interval!). The fact that this results in the same answer obtained for a piecewise-linear CDF is encouraging.

Note, though, that if we really had integer data, we couldn’t uniformly distribute 6 values across 10 units; that’s another sense in which the formula is only approximate. It necessarily assumes a continuous distribution, in addition to the piecewise-linear CDF.

]]>I had never heard of such a formula until 2007, when a question was asked about applying it in a special case. That answer wasn’t archived, but when we got another question about it a year later, it was time to publish what I had figured out. Here is the 2008 question, from Saptarshi:

Different Formulas for Calculating Mode I am a M.B.A student. Our teacher tells a formula to find out mode, that is Z=L1+(F1-F0)/(2F1-F0-F2)*i where: L1 = lower limit of modal class F1 = modal class frequency. F2 = just after the modal class frequency. F0 = just previous the modal class frequency. i = class interval. Z = the mode value. But I saw in most of cases the highest frequency is the mode. They don't use that formula. (I saw that when searching about mode in google). So why we need that formula? Can you please explain me.

I think he is saying that whereas he was taught this formula for the mode, most sources he found online do as I have usually seen, identifying only the class with the greatest frequency as the mode (actually the *modal class*). So, why was he taught this formula, and what does it mean?

The formula, which I now find more easily around the Web than I could back then, takes several forms. His, in more readable format, is $$Z = L_1 + \frac{F_1 – F_0}{2F_1 – F_0 – F_2}\cdot i.$$ The form we had previously been asked about was a little different: $$Z = L_1 + \frac{d_1}{d_1 + d_2}\cdot i,$$ where \(d_1\) and \(d_2\) are the differences between the frequency of the modal class and those of its nearest neighbors.

I started this answer by stating what the formula does, and showing the two formulas to be equivalent:

This formula gives alinear interpolation to estimate the actual value of the modefrom grouped data; otherwise, all you really know is the modal class (which is sufficient for many purposes). Your formula can be written differently if we take d1 = F1 - F0 (difference between modal class and previous class) d2 = F1 - F2 (difference between modal class and next class) Then d1 + d2 = (F1 - F0) + (F1 - F2) = 2F1 - F0 - F1, so the formula is Z = L1 + d1/(d1 + d2) * i

I have never found an explanation of the formula in a mathematical source that explains its proper derivation and the conditions under which it is valid; there are several sites that explain it after-the-fact as I will do below, but most sources I find are at a basic level where they just state the formula and tell students to use it. Most, in fact, just state that it gives the mode, whereas, as stated above, it is really only a *guess* — an estimate of what the actual mode *might* be, based on the shape of the histogram. We don’t really know how the data are distributed within any of the classes, so it is impossible to know the actual mode; it may not even be in the modal class. On the other hand, the actual mode may just reflect that some random data points happen to be identical; a number based on the overall shape may really be more meaningful! So this is a valid concept, at least in some situations.

One source that gives this formula with a proper description is

Math Is Fun: Mean, Median and Mode from Grouped Frequencies

which says, under “Estimating the Mode from Grouped Data”,

We can easily find the modal group (the group with the highest frequency), which is

61 – 65.We can say “the

modal groupis 61 – 65″.But the actual Mode may not even be in that group! Or there may be more than one mode. Without the raw data we don’t really know.

But, we can

estimate the Modeusing the following formula:

Estimated Mode=\(L + \frac{f_m − f_{m-1}}{(f_m − f_{m-1}) + (f_m − f_{m+1})} × w\)

where:

- L is the lower class boundary of the modal group
- f
_{m-1}is the frequency of the group before the modal group- f
_{m}is the frequency of the modal group- f
_{m+1}is the frequency of the group after the modal group- w is the group width

I gave some references, and then quoted what I had said in answering the 2007 question:

The formula these sites give, with definitions of the variables, is (using the second site's version): When data are already grouped in a frequency distribution, we can assume that the mode is located in the class with the most items. In order to determinea single value for the modefrom this modal class, we use mode = LBMo + [d1 /(d1+d2)] (Width) where LBMo = lower boundary of the modal class Width = width of the modal class interval d1 = frequency of the modal class minus the frequency of the class directly below it d2 = frequency of the modal class minus the frequency of the class directly above it Note that d1 and d2 relate to the classes on the left and on the right in the histogram. If there is no class on the left, then you can imagine a class with frequency zero. Then the formula applies easily.

Note that this source rightly said the formula only gives “*a* single value for the mode”, not “*the* actual value of the mode”.

My last paragraph above dealt with the issue the earlier questioner had been asking about.

The purpose of this formula is to identify one value within the modal class that seems likely to be the peak of the curve if you smoothed out the histogram. It does this by taking the value within the interval whose distance from the class on either side is proportional to how much less the frequency is on either side. You can see this by rewriting the formula: mode - L1 d1 --------- = ------- Width d1 + d2

That is, the distance from the lower bound (left end) of the modal class, as a fraction of the width of the modal class, is the ratio of the left difference to the sum of the differences.

In thinking about this relationship, I saw a graphical meaning to the formula (which I now see on various other sites; I’m sure I’m not the first to have seen it):

There is a simple geometrical way you could find this point. Just draw lines from the top corners of the modal bar to the near corners of the neighboring bars, and the mode estimate lies at the intersection: +---------+ | \ / |d2 d1| X | | / : +---------+ | / : | | +---------+ : | | | | : | | | | : | | | | : | | | | : | | +---------+-----:---+---------+ L1 mode |<------->| width

This puts the estimated mode closer to the higher neighboring bar, which makes sense. (I’ll have more to say about that below.) If you’re not sure how this relates to the proportion I wrote, look for a pair of similar triangles …

I closed with an example (again, quoted from my response a year earlier, and using that writer’s notation, where “85<91” meant the six numbers starting at 85, and less than 91):

For an example, take these classes: 85<91 10 91<97 8 97<103 3 103<109 8 109<115 0 115<121 7 The modal class is 85<91. LBmo = 85 width = 6 d1 = 10 - 0 = 10 (since the frequency on the left is 0) d2 = 10 - 8 = 2 (since the frequency on the right is 8) mode = LBMo + [d1 /(d1+d2)] (Width) = 85 + (10/12)(6) = 85 + 5 = 90 This is 5 from the left and 1 from the right, a ratio of 5:1, while the differences in frequency are 10:2.

The next question about this formula was in 2015, from Gaurav:

Mode's Fickle Formula? The formula for mode isnot telling me the actual mode. In fact, after grouping data, I have found many situations wherethe mode changes. For example, given these data: 1, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4 The mode is 1. But after grouping data, as below, the mode becomes approximately 3.3: CLASS FREQUENCY 1-3 5 3-5 6 Why does the mode of data change like this?

It appears that Gaurav had not been taught that the formula gives only a *guess* at the mode, and can’t be expected to give the actual mode, since it doesn’t have access to the actual data. But the question provided a good opportunity to examine more closely what the formula actually does. I replied:

The formula you have presumably been given for the mode of grouped data does not necessarily give the actual mode. Rather, it gives you a guess that is considered reasonable under some conditions.When you group data, you lose information, so you should expect not to be able to recover detail using any formula. In other words, the mode didn't change; you just guessed the mode from insufficient data. I don't actually know of any theoretical basis for the formula that would make it reasonable to expect it to be correct for some particular kind of data (e.g., approximately normal). But given the questions that we math doctors routinely see about this subject, it appears that it is commonly taught without explaining what the formula really is: an approximation, at best.

I gave a link to the answer above, to make sure we were talking about the same formula. Then I showed how the actual data provided (in the form of a “dot plot”) compare to the histogram:

Note that your data are not normally distributed, so it is not at all surprising that the formula would not work. Also, the actual data (*'s) and the grouped data (bars) look quite different: +---+ +---+ | |* | | |* |* *| |* |* *| |* *|* *| --+---+---+-- 1 2 3 4

Looking at that, we see that the mode of the actual data is not even in the modal class; this is because the data are not smoothly distributed, so the grouping changes its character. (My guess is that the formula is considered valid, as I suggested, for normally distributed data; it would be at least reasonable for a smooth and symmetrical distribution.)

We should check his work with the formula. Using the formula in the first form I showed above, $$Z = L_1 + \frac{F_1 – F_0}{2F_1 – F_0 – F_2}\cdot i,$$ we have \(L_1 = 2.5, F_0 = 5, F_1 = 6, F_2 = 0, i = 2\) so $$Z = 3 + \frac{6 – 5}{2\cdot 6 – 5 – 0}\cdot 2 = 3 + \frac{1}{7}\cdot 2 = 3.29.$$ Here I took \(L_1\) to be 3, the lower class *limit *as stated in the first form I quoted above, rather than 2.5, the lower class *boundary*, as in most versions I have found, in order to get his answer. I think the latter is the proper definition of the variable; I hadn’t noticed this discrepancy until now.

Gaurav asked another question:

Why would the mode of grouped data depend on the frequency of pre- and post-modal classes?

This is essentially asking for a deeper explanation of the formula. I replied:

The page I referred you to explains the formula as well as I can. The basic idea is thatif you have data that looks like a normal distribution(one symmetrical hump), but group the data,the classes on either side would be asymmetrical if the actual mode is not centered in the modal class; so looking at the adjacent classes can help estimate where the mode would be within the class. Here are two examples: symmetrical asymmetrical | | +-*-+ +--*+ * * |* |* *| |* * +-*-+ +-*-+ +-*-+ *| | *| * | | * +*--+ | +*--+ +*--+ | | +--*+ +-*-+ | | | * +---+---+---+---+---+ +---+---+---+---+---+ The symmetrical histogram should have its mode in the middle of the modal class. The histogram on the right -- with a higher bar on the right of the modal class -- should have its mode closer to the higher side. The formula does this in the simplest possible way.

I have made a histogram by binning the standard normal distribution in various ways, and found that the formula does give the mode quite accurately in that case. When I did the same for a triangular distribution, it was less accurate.

Here is a question from 2016:

Breaking the Mode How do you find the mode of this grouped data? data freq ------------- 10-14 5 15-19 12 20-24 12 25-29 10 30-34 4 I know the mode formula: Mo = L + (d1/(d1 + d2))*width I calculated its parts like this: L = 14.5 d1 = 12 - 5 = 7 d2 = 12 - 10 = 2 width = 24.5 - 14.5 = 10 But I'm confused about the last two. Should d2 = 12 - 12 = 0? Should the width be 5? From there, I went on to determine mode = 14.5 + 7/(7 + 2)*10 = 14.5 + 7.8 = 22.3 Is my work true?

The answer seems reasonable (it is at least within a modal class). But does the formula work when the “modal class” is double-wide?

First, we have to keep in mind that we don’t even know what it would mean for an answer to be correct, since we don’t know the actual data! But I answered:

The formula you are using does not really tell you "the mode"; it just makes a reasonable estimate of where the mode might be if the underlying distribution is, say, approximately normal.Since it is not exact in the first place, it probably doesn't matter much how you apply it in special cases.If you have been taught the formula without any further explanation, then you can't be expected to follow any particular rules for this case. I have never found a source for this formula that explains its theoretical basis, or the conditions under which it should be used, or how it applies in unusual cases (whichshould be an inference from the theory, if there were one). I've explained what I can guess from the formula, and from what sources I do find, here: Different Formulas for Calculating Mode http://mathforum.org/library/drmath/view/72977.html This explanation for itassumes generally that each class has the same width, so it doesn't quite apply when the "modal class" has twice the width of the others, which is the way you are treating it.

I made a suggestion, to rework the classes so they all have the same width, which is that of the double modal class:

I would probably rework the data so that there are fewer (equal width) classes, and just one modal class: data freq ------------ 5- 9 0 [added implied empty class] 10-14 5 15-19 12 20-24 12 25-29 10 30-34 4 data freq ------------ 5-14 5 [combined classes in pairs] 15-24 24 25-34 14 The formula applies directly now: L = 14.5 d1 = 24 - 5 = 19 d2 = 24 - 14 = 10 width = 24.5 - 14.5 = 10 Mo = L + (d1/(d1 + d2))*width = 14.5 + (19/(19 + 10))*10 = 21.05 Again, that seems to fit a little better with the derivation, but I don't think it makes much difference, since there is really no "correct" mode anyway! Your answer is not necessarily a bad one.

If anyone reading this knows an original source for the formula that gives a solid foundation for it, rather than just an ad-hoc linear interpolation, I would love to know.

]]>It’s surprising how many questions we get that end up being about problems that are poorly worded or simply wrong. But these can be as illuminating as good problems, by showing ways to catch the error. This is one simple in itself, but will lead us into a common question about the Side-Side-Angle case of triangle congruence.

Here is the question, from October:

Question: In the figure, prove that ABCD is a parallelogram. Please….

My try:

To prove parallelogram, should show Angle BDC = Angle DBA (alternate angle) and if we are able to show them equal then AB will be parallel to DC and the line joining equal and parallel side are also equal and parallel. But how to prove them equal.

Is there any Method?

This is my try….

Statement Reason In Triangle ODC and OAB 1. DC = AB(s) given 2.DO = OB(s) given Only SS is possible for me. How to prove that ABCD is a parallelogram.

Yadav is following a good strategy, looking for a way to prove the conclusion (the last step), and then working backward. He sees that if he could prove AB and DC are parallel, then the conclusion would follow from a theorem about parallelograms; but he can’t find a way to get from the givens to that. Showing that triangles ODC and OAB are congruent would supply the needed congruent angles, but the only pairs of congruent parts he sees are two sides.

Examining the problem, I saw that he was missing something, but it was not sufficient:

Hi, Yadav.

Thanks for showing your thinking!

There is also an

angleshared by the triangles ABO and CDO. This gives you SSA, which as you probably know doesnotnecessarily imply congruence. I wonder if the problem didn’t say toprove it isa parallelogram, but todetermine whether it is? Or were you told more than the picture shows?I would look for a counterexample. Here are two examples of how to find one:

The SSA condition (Side-Side-Angle: two sides and a non-included angle being congruent) is, in general, not sufficient to prove congruence of triangles; in some cases you can in fact prove that the triangles are not congruent by making a counterexample, and in other cases additional data can make the proof possible. I didn’t yet know for sure that the claim was false (since all I knew was that I saw no sufficient basis to prove this congruence); but it was a reasonable thing to suggest.

Let’s take a look at SSA, first at why it doesn’t work, and then at the additions that can correct it. Then I’ll get back to this question.

Here is another question from our archives (1995), besides the two I referred to above:

Congruency Theorems for Triangles Given the three methods of proving two triangles congruent: SSS SAS ASA Using SAS, the angle must be a contained angle, Correct? Then, two triangles, one which has two sides that are of equal length to the second triangle, and both having an angle (not contained) equal, cannot be proved congruent.It seems to me that they are congruent, though.Any thoughts on this?

What looks right is not always right! Although in some cases it seems as if the SSA condition is sufficient, it is not always.

Doctor Ethan responded:

What you are looking for is an SSA postulate. I am afraid thatit just isn't there. It is interesting to see why it can't be; and there are a few special cases in which it actually works. First let's look atone special caseand then we'll look at why it doesn't work in general. Are you familiar with the HL postulate of congruency? It is for right triangles, and it says that if the hypotenuse and legs of two right triangles are congruent, then the triangles are congruent. That essentially is an SSA postulate, except that we require that the A be a right angle. I know that that is a pretty exclusive restriction, but we actually can get a little broader. Before I tell you how broad, let's look ata triangle for which it won't work.

(Ethan is calling these postulates; as I have previously discussed, in different systems they may be theorems, and it is probably better if only one is called a postulate. For our purposes, it doesn’t matter.) We’ll look later at how to make a valid SSA theorem; a counterexample of the basic SSA is the first task:

Here is an experiment for you to try. This should at least give you one example of a place that it won't work, and then you can write back with what you find out and I'll give you another hint about what you can try. Get three sticks, one that is three inches long and two that are five inches long. Use them to make an isosceles triangle. It will look something like this: /\ / \ / \ / \ 5/ \5 / \ / \ / \ /________________\ 3 Label them like this: /\ / \ / \ / \ a/ \c / \ / \ / \ /________________\ b Now I want you to keep the angle fixed between a and c and move the stick of length 3 in until it forms a triangle again. If you have followed these instructions, you have constructed two triangles that have congruencies SSA but are clearly not congruent triangles.

Roger apparently never replied, so Doctor Ethan didn’t get a chance to elaborate as promised. I will at least finish the counterexample. Here is the triangle he showed:

If the expected SSA theorem were true, then any triangle with the angle and sides marked in red would be congruent to this one. The challenge is, can we make a different triangle that fits those requirements? The answer is, yes: swing the 3-unit side around until it touches the blue side in a different place, like this:

The new triangle DCB is not congruent to ACB, though it has the same three red parts. This shows that the SSA conditions are not sufficient to prove congruence.

(Another student, Nathan, wrote in 1999 asking for the answer, and Doctor Rob provided a different example, with all integer sides, and showed how to make others. This is included in the same page.)

We have another such demonstration here:

Why There Is No SSA Congruence Postulate or Theorem

Back to our original problem, Yadav wrote back answering my question:

Yes Sir. There was a question with a figure.

The question was:

- In the figure, AB=CD and BO=DO. Then prove that ABCD is a parallelogram.
What is the answer sir please reply me?

It appears to be definitely claiming that the statement is provable. Now, can we make a counterexample (so that we don’t need to try to prove the claim)? Or is it in fact true? I used those counterexamples to SSA as a model for what we needed:

The answer, as I suggested, is that ABCD is

notnecessarily a parallelogram. You can’t prove a false statement!Here is a counterexample, in which the conditions are met but the conclusion is false. Did you read the links I provided, and trying doing what they said?

Here, ABCD is a quadrilateral with AB = CD and BO = DO. But it is not a parallelogram, so the claim is false.

Here is how I constructed my counterexample; you should be able to see its inspiration:

I first made a parallelogram A’BCD, then swung A’ in a circle to find A (which required moving parts around until that circle met line A’C as needed). As a result, triangles ODC and OAB are not congruent (as ODC and OA’B are), so AB is not parallel to DC.

You may notice that in the figure originally given, this can’t happen because of the particular angles used; that could either be intentional, to make it harder for the student to discover that the theorem is false, or accidental, the reason its creator didn’t realize it.

Let’s get back to SSA. What additional conditions would it take to make a valid theorem? This is something Doctor Ethan said he would talk about, but never got the chance.

Here is a question from 2001, asking about such a valid theorem:

SSA Proof How do I prove the hypotenuse-leg congruence for triangles? My professor suggested using the S.S.A. theorem

The HL theorem, as Doctor Ethan mentioned, is a special case of SSA, where the two sides are the hypotenuse and a leg of a right triangle, and the angle is the right angle. This professor is evidently referring to some form of SSA theorem that is valid; what is it?

Doctor Rob discussed such a general theorem:

If two triangles have one angle, one adjacent side, and the opposite side equal, thenthey may or may not be congruent. That means that using S.S.A. is tricky. Suppose, for example, you were givenacute <Aand sides b and a. Then the picture might look like this:

This is how you might start out drawing a triangle, given its ∠A, b, and a (not yet drawn).

If you know some trigonometry, you'll realize that d = b*sin(<A). Now draw a circle with radius a and center C. Point B must lie on line AX and on the circle. There are several cases: Case 1: a < d. The circle and line don't intersect, and there areno triangleswith these parameters.

Note that this case will never occur in congruence proofs, because neither triangle exists!

Case 2: a = d. The circle and line are tangent at X, so B = X, and there is justone trianglewith these parameters. In this case, <ABC is a right angle.

This case will not show up in the typical proof, because it requires information about the relationship between S, S, and A that requires either knowledge of the other angle, or of trigonometry.

Case 3: d < a < b. The circle and line intersect in two points B and B', with B' between A and X, and B to the right of X. Now there aretwo triangleswith these parameters, ABC and AB'C, and you can't prove congruence.

This is often called the “ambiguous case” in trigonometry, and is like our counterexamples above.

Case 4: b <= a. The circle and the line intersect in two points B and B', but B' is at or to the left of A, so the angle drawn above is an exterior angle to that triangle. Thus there is justone trianglewith these parameters, and you can prove congruence.

These cases all assumed that ∠A is acute.

If <A is a right angle, and so d = b, the following cases hold: Case 5: a <= b.No triangle.

Case 6: b < a. There is justone triangle, and you can prove congruence.

This is the HL case.

If <A is obtuse, then following cases hold: Case 7: a <= b.No triangle.

Case 8: b < a. There is justone triangle, and you can prove congruence.

Thus the theorem should say, THEOREM (S.S.A.): If two triangles ABC and DEF have BC congruent to EF, AC congruent to DF, and <A congruent to <D, then: 1. if AC >= BC, the triangles are congruent. 2. if BC = AC*sin(<A), then the triangles are congruent. 3. if AC*sin(<A) < BC < AC, then the triangles may or may not be congruent. In your case, you know that the hypotenuse AC is longer than either leg, including BC, so Part 1 of the above theorem applies.

So, in order to apply SSA to our parallelogram proof, we would need to know that AB > OB (as in the original picture), or that \(\displaystyle \frac{\ \overline{AB}\ }{\overline{OB}} = \sin(\angle AOB)\).

]]>