We’ll start with this question from 1997:

Root Multiplicity and Polynomial Functions What effect does multiplicity have on a polynomial function? Here's an example of what I mean by multiplicity: (x+1)(x-2)^2 where -1 has a multiplicity of 1 and 2 of 2. I can't figure out what effects multiplicities have, althoughodd multiplicitiesappear to make the functionparallel to the x axis for a while, like x(x-2)^9. But 3 is not parallel to the x axis, nor is 1. The higher the multiplicity, the farther the function appears to travel along the x axis, but only up to a point. Can someone please help?

The **multiplicity of a zero** of a polynomial is the degree of the corresponding factor; if there is a factor \((x-a)^m\), then we say that *a* is a zero of multiplicity *m*. When the multiplicity is 1, the graph crosses straight through the *x*-axis, but for higher multiplicities, it briefly flattens out, as Alex has observed.

Here are the graphs of Alex’s two functions:

There, the zero of multiplicity 2, at \(x=2\), is **momentarily horizontal**, touching the *x*-axis (and then turning back, as we’ll see, because 2 is even).

With multiplicity 9, again at \(x=2\),, it appears to be (nearly) horizontal for some distance (most of the way from \(x=1\) to \(x=3\), as he pointed out); it is actually horizontal only momentarily, but is very close to horizontal for much farther. (Then it crosses over, because 9 is odd.)

It still looks pretty horizontal (though not quite as far) if we don’t stretch the scale vertically:

Doctor Jerry answered:

Hi Alex, You have done well in thinking about the effect on the graph of multiple roots. If y = q(x)*(x-a)^k, where q(x) is the rest of the polynomial we are considering, if k is large, then the factor(x-a)^k will be very small within 1 of a. This means that the graph will be quite flat in the interval (a-1,a+1). Of course, if k is only 1 or 2, the flatness is limited to a short interval about a.

The idea is that when \(|x-a|\) is less than 1, raising it to a power will make it smaller. Closer to *a*, this factor will be *very* small; for instance, for \(|x-a|<0.5\), \(|x-a|^9<0.002\). So it is, indeed, very flat. Lower powers are less effective; for \(|x-a|<0.5\), \(|x-a|^2<0.25\), but it is still flat when you get closer: for \(|x-a|<0.05\), \(|x-a|^2<0.0025\). This matches what we see on the graphs.

Note that when \(k=1\), there is really no flattening at all; that was an accidental misstatement.

When k iseven, the factor (x-a)^kdoesn't change signas x moves from the left of a to the right of a (assuming that q(x) doesn't have roots very close to a). So, the graph will be tangent to the x-axis. Foroddk, the graphcrosses the x-axis, even though it may be quite flat.

For even multiplicity, books often say the graph “**touches and turns**“, being either positive on both sides, or negative on both sides. For odd multiplicity, it “**crosses**“; I describe it as “**pausing** before continuing in the same direction”.

Here is a related question from 2004:

Graphing Polynomial Functions in Factored Form Why doesevenmultiplicity cause a graph totouchan intercept andoddmultiplicity cause a graph tocrossit?

We saw this in the first graph above, where the zero at \(x=-1\) had odd multiplicity (1) and crossed the *x*-axis from above to below, while the zero at \(x=2\) had even multiplicity (2) and touched from above without crossing. But we only “touched” on this fact there.

Doctor Vogler answered:

Hi Shannon, Thanks for writing to Dr Math. Suppose we have a polynomial f(x) = (x - r)^m * (other terms) with aroot r of multiplicity m. So the other terms are not near zero when x is near r. That is, if you look close enough to r, the other terms will be close to some nonzero number, either positive or negative. Or, said differently, the limit as x approaches r of those other terms is somenonzero real number. If it was zero, then we could factor out another x - r from those other terms, and the root r would have multiplicity m + 1.

He is using the word “term” where I would say “factor”; the main point is that the multiplicity must be the highest degree you can put on that factor, so the “other terms” (Doctor Jerry’s \(q(x)\)) will not be zero **at** *r*, and will be close to some non-zero number anywhere **near** *r*.

For example, if our function is $$f(x)=x(x+1)^2(x-2)^3$$ and we are focusing on the zero at \(x=-1\), with multiplicity 2, we are looking at it as $$f(x)=\left[x(x-2)^3\right](x+1)^2$$ so that the “other terms” are \(x(x-2)^3\), which is near \((-1)((-1)-2)^3=27\) when *x* is near \(-1\). Here is the graph, which crosses the axis at \(x=0\) and at \(x=2\), but only touches at \(x=-1\):

So then when x = r, f(x) = f(r) = 0. That causes the graph to touch the x-axis at (r,0). Now when x isjust larger than r(that is, just to the right of r), then x-r will be positive, and (x-r)^m will also be positive. Sof(x) will have the same sign as the product of those other terms. But when x isjust smaller than r, then x-r will be negative, and (x-r)^m might be negative and might be positive. In fact, (x-r)^m will benegative if m is odd, and (x-r)^m will bepositive if m is even. So if m iseven, then f(x) will have the same sign as those other terms, which means it approaches the 0 at f(r) but has the same sign on both sides of r, so ittouches and does not cross. But if m isodd, then f(x) will have the opposite sign as those other terms, which means it has a different sign on each side of r, so itcrosses the axis.

He hasn’t explicitly talked here about the flattening aspect (for multiplicities greater than 1), just the sign aspect. We’ll put everything together next.

Here is the question I find most interesting, from 2014:

Curvy ... and Topsy-Turvy? When sketching curves of higher degree polynomials, we can use factorisation to look at the parts of the equation that make up the whole polynomial. Therefore, we can look at the curve in terms ofsmaller curves, especially on the x-axis (the roots). For example, consider the function y = (x - 1) * (x + 3)^2 * (x - 2)^3 * (x + 1)^4 Its curve will cross the x-axislike a linear functionat 1,like a parabolaat -3,like a cubicat 2, andlike a quarticat -1. When we do this, we get the shape of the curve. It begins in the first quadrant and ends in the second because it is positive. But when we look at the roots of the equation, the linear function formed at (1, 0) looks like the equation y = -x - 1 (negative) rather than y = x - 1 (as in the original term). Why is this? Even though the original term was (x - 1), the graph of the entire function looks negative near x = 1. Similarly, in other regions, the signs of the roots seem opposite to the roots of the individual terms. What part of the equation determines this?

Here is the graph of his function, with the zeros marked:

What Sam has been taught, evidently, is that near an *x*-intercept (zero, or root), a polynomial behaves in some sense like a polynomial whose degree matches the multiplicity of the zero (the exponent on the corresponding factor). Here are graphs of polynomials of first, second, third, and fourth degree, for comparison:

But what does “like” mean?

Doctor Greenie answered, taking “like” to mean “exactly like”, according to Sam’s evident expectation:

Hi, Sam -- What you are saying is not true at all. The factorization of the polynomial tells uswhere the roots are, and the powers of the factors tell us thegeneral behaviorof the function at the zeros. But it appears that you are saying that the function in your example, because of the factor (x - 1), should look like the linear function y = x - 1 at x = 1. That is not at all true. If it were, then the slope of the function at every zero would be positive, because the factors are all of the form (x - a), not (-x - a), and that would not be possible.

As Sam had observed, the function does not look like the corresponding power function in this sense.

I, writing at the same time, saw “like” more broadly:

Hi, Sam. An excellent question! Let's look at the factor (x - 1) in its context: y = (x - 1) * (x + 3)^2 * (x - 2)^3 * (x + 1)^4 ===== Since we're looking at the curve NEAR x = 1, all the other factors will be near the value they take at x = 1. (A small change in x will produce a large relative change in (x - 1), since it is near zero, but only a small relative change in the other factors.) If wereplace x with 1 in those other factors, we have y ≈ (x - 1) * (1 + 3)^2 * (1 - 2)^3 * (1 + 1)^4 = (x - 1)(16)(-1)(16) = -256(x - 1) That factor of -256 answers your question. The specific linear function that approximates the curve near x = 1 is y = -256(x - 1), with a negative slope.

I’m not sure I had ever taken the idea quite this far before. But it was implied in the other two answers above, if I had thought about it.

Here is what this linear function looks like, in comparison to the polynomial:

This line is tangent to the curve; if you zoomed into the graph, it would look almost identical.

Similarly, if we replace *x* in all factors except \((x+3)^2\) with \(-3\), we get \(y=8000(x+3)^2\), which looks like this:

This is not only tangent to the curve (that is, horizontal) at the given point, but remains close for points nearby; here is what it looks like if we zoom in:

Here are the results if we do the same with the other two zeros:

You can see that the flatness at \(-1\) is because \(y=(x+1)^4\) is flattened. Multiplying by 216 doesn’t change that much.

I wanted to let him do this for himself … but then I couldn’t resist showing my results in case he didn’t:

Try doing the same with the other x-intercepts, and graph these approximations on the same axes as the curve. I think you'll like what you see! Here is my graph:

After seeing Doctor Greenie’s answer, I added more:

Hi, Sam. I have a clarification to make. I tell my students that a polynomial near any zero h "looks like" the corresponding factor (x - h)^n, in the broad sense thatif the factor is cubed, it looks like a cubic, and so on. This is what you said initially ("like a cubic"). A cubic, of course, can look either like x^3 or like -x^3, which is where your question arose.

So the issue is what we mean by “a” cubic, more than on the meaning of “looks like”.

In a more exact sense, itlooks like a(x - h)^n, where the coefficient "a" has to be determined as I showed you, by putting x = h into the other factors. If you want to get the direction right, as well as the overall shape, you have to at least determine thesignof this coefficient.

This amounts to finding the sign of the entire polynomial for *x* slightly greater than the zero, which determines the effective “leading coefficient”.

So the answer to your specific question -- about what part of the expression determines the direction -- is: "all the rest."

So we are back to the “other terms” we’ve been seeing all along.

The factor of interest determines the degree of the shape we are looking for, and the sign of all the other factors determines the direction of the shape. This is the same as the “other terms” or “q(x)” mentioned in the first two answers. I continued:

The graph I shared yesterday showed this for all four zeros, carrying out the plan I suggested. Note how, with the coefficients calculated correctly, the graph looked very much like the corresponding functions; in fact, no polynomial could more closely "snuggle up to" the curve at those points than these. They are called"osculating curves"(from the Latin for "kiss"): http://mathworld.wolfram.com/OsculatingCurves.html

That page says osculating curves have the same **value**, **first derivative**, and **second derivative** at the given point, so that they have the same **location**, **slope**, and **curvature**. Let’s check that out for our example at \(x=2\); this will require a little calculus:

$$f(x)=(x-1)(x+3)^2(x-2)^3(x+1)^4\\=x^{10}+3x^9-13x^8-37x^7+57x^6+157x^5-71x^4-279x^3-46x^2+156x+72$$

$$f'(x)=10x^9+27x^8-104x^7-259x^6+342x^5+785x^4-284x^3-837x^2-92x+156$$

$$f^{\prime\prime}(x)=90x^8+216x^7-728x^6-1554x^5+1710x^4+3140x^3-852x^2-1674x-92$$

$$f^{\prime\prime\prime}(x)=720x^7+1512x^6-4368x^5-7770x^4+6840x^3+9420x^2-1704x-1674$$

At \(x=2\),

$$f(2)=(2)^{10}+3(2)^9-13(2)^8-37(2)^7+57(2)^6+157(2)^5-71(2)^4-279(2)^3-46(2)^2+156(2)+72=0$$

$$f'(2)=10(2)^9+27(2)^8-104(2)^7-259(2)^6+342(2)^5+785(2)^4-284(2)^3-837(2)^2-92(2)+156=0$$

$$f^{\prime\prime}(2)=90(2)^8+216(2)^7-728(2)^6-1554(2)^5+1710(2)^4+3140(2)^3-852(2)^2-1674(2)-92=0$$

$$f^{\prime\prime\prime}(2)=720(2)^7+1512(2)^6-4368(2)^5-7770(2)^4+6840(2)^3+9420(2)^2-1704(2)-1674=12,150$$

Those all agree with the derivatives of \(g(x)=2025(x-2)^3\), whose first three derivatives are

$$g'(x)=6075(x-2)^2\;\;\;g'(2)=0$$

$$g^{\prime\prime}(x)=12,150(x-2)\;\;\;g^{\prime\prime}(2)=0$$

$$g^{\prime\prime\prime}(x)=12,150\;\;\;g^{\prime\prime\prime}(x)=12,150$$

So our cubic “hugs” as tightly as a polynomial possibly can: *All* its derivatives match at that point. We might call it a “hyper-osculating” curve.

And we can easily see why, if we use the product rule to find the derivatives:

$$f(x)=\left[(x-1)(x+3)^2(x+1)^4\right]\cdot(x-2)^3$$

$$f'(x)=\left[\text{6th degree polynomial}]\right]\cdot(x-2)^3+\left[\text{7th degree polynomial}\right]\cdot3(x-2)^2$$

which is zero at \(x=2\);

$$f^{\prime\prime}(x)=\left[\text{5th degree polynomial}]\right]\cdot(x-2)^3+\left[\text{6th degree polynomial}]\right]\cdot3(x-2)^2\\+\left[\text{6th degree polynomial}\right]\cdot3(x-2)^2+\left[\text{7th degree polynomial}]\right]\cdot6(x-2)$$

which is zero at \(x=2\), because every term has \(x-2\) as a factor.

Then the next derivative will have one term that does not have that factor; since that “7th degree polynomial” is just the “other terms” \((x-1)(x+3)^2(x+1)^4\), it will be $$f^{\prime\prime\prime}(x)=\left[\text{stuff with }(x-2)\right]+\left[(x-1)(x+3)^2(x+1)^4\right]\cdot6$$ and $$f^{\prime\prime\prime}(2)=\left[(2-1)(2+3)^2(2+1)^4\right]\cdot6=2025\cdot6=12,150$$ just as before.

]]>

Here is the question, from late April, with two parts:

I have seen a statement –

All polynomial functions of odd order have at least one zero, while polynomial functions of even order may not have a zero. what does this statement mean? What is the relation between order of polynomial and no. of real zeroes in a polynomial?I have seen a formula in a YouTube video which I have not seen in any textbook and the formula is

No. of

turning pointsin a polynomial graph =no. of zeros+ 1 –no. of even zeros.Is this formula correct?

I know that

maximum no of turning pointspossible for a polynomial of degree n is (n-1) and this is self-evident. But how the above formula comes? Plz help.

The **order** (or **degree**) of a polynomial is the greatest exponent on the variable. A **zero** of a function is an input for which the function’s value is zero; on a graph, a *real* zero is an *x*-intercept. A **turning point** on a graph is a point where it changes from increasing to decreasing, or vice versa.

Doctor Rick answered, first dealing with the first paragraph:

Hi, Debarghya.

The

number of real zerosof a polynomial isless than or equal to the degree(order) of the polynomial. For instance, f(x) = x^{2}– 1 (order 2) hastwo real zeros; g(x) = x^{2}hasone zero(of multiplicity 2); and h(x) = x^{2}+ 1 hasno real zeros.

Note that Debarghya is talking about graphs, and therefore ignoring non-real zeros, which are not visible on a (real-valued) graph. That will eventually become an issue!

Here are the graphs of those three functions (f, g, h), showing how you can get 2, 1, or 0 (real) zeros:

These all have one turning point. You can see how an even degree makes it possible to have no (real) zeros.

There are (at least) two good ways to understand why

polynomials of odd degreehave at least one zero, depending on what you have learned about polynomials.One way is to

think about the graphof a polynomial P(x). As x gets far from zero, the behavior of the graph is dominated by the leading term. If the leading coefficient is positive, then x^{n}will increase without limit as x increases, and hence P(x) > 0 for sufficiently great x. On the other hand, as x decreases (becomes increasingly negative), x^{n}will approach negative infinity for odd n. Since a polynomial is a continuous function, its graph must cross the x axis somewhere, in order to change from negative to positive. Thus there must beat least one zero of the function (x-intercept of the graph).

If a continuous graph is below the *x*-axis at one end, and above the *x*-axis at the other, then it must cross at least once.

Here are two polynomials with degree 9, one crossing only once, the other more:

The red one has 7 zeros and 8 turning points (two of which are also zeros); the blue one has one zero and 4 turning points. Both illustrate the fact that the number of **zeros** is at most *n*, and the number of **turning points** is at most \(n-1\).

The other way is to use the

Fundamental Theorem of Algebra: that every polynomial equation of degree n with complex number coefficients has n roots, or solutions, in the complex numbers. We’re talking, I assume, aboutreal coefficients, in which case it is proved further that any non-real roots must come incomplex-conjugate pairs, that is, if a+bi is a root then a-bi is also a root. From this we can see that the number of real roots (counting multiplicity) must have the same parity (odd or even) as the degree of the polynomial. Since zero is an even number,an odd-degree polynomial can’t have zero real roots— it must have at least one.

Equivalently, a polynomial with odd degree can be factored into an odd number of linear factors (corresponding to zeros); and any factors corresponding to *non-real* zeros must come in pairs, so that removing them leaves an odd number of factors representing real zeros. Therefore, there can’t be zero real zeros.

The Fundamental Theorem of Algebra implies that the number of **zeros** of a polynomial is no more than its degree; and when applied to the derivative (a concept from calculus) it implies that the number of **turning points** (which are zeros of the derivative) is no more than one less than the degree. So all these claims are easy to prove.

Now, about the video and its formula:

I don’t know what is meant by “even zeros” in that formula. My guess would be “

zeros of even multiplicity“, but the formula is not valid with that meaning. For example, here is the graph of f(x) = (x – 2)(x – 1)x(x + 1)(x + 2) + 5; it has one zero, but four turning points.Perhaps you could show us where you saw the formula.

This example has degree 5, with one (odd) zero and four turning points; it is not true that “No. of turning points = no. of zeros + 1 – no. of even zeros,” which would give \(T=Z+1-E=1+1-0=2\), rather than the correct answer, 4.

While we waited for more information, I did some searching:

I found what I presume is the source video,

How to find number of turning points of polynomial using Formula from equations with 5 examples

Here is the final screenshot showing all his examples:

We see, first, that the formula was misstated above, with the wrong signs; but it is still not true for Doctor Rick’s example, taken at face value: \(Z-1+E=1-1+0=0\), which is not the number of turning points, 4.

But watching the video, we see that his numbers at the right are not quite what he says they are:

For

examples b and c, which are like Doctor Rick’s example, heignores the added constants! So he is not really counting zeros of the given polynomial, but zeros of ashifted polynomialthat has a full number of real zeros. This makes his claim work (when interpreted as he does) for Doctor Rick’s example.But he doesn’t even make an attempt to

showthat his results are true! That meansthis is not mathematics. (I think all his examples are correct, but they do not exhaust the possibilities.)I don’t know exact conditions under which what he says is correct, but I can see at least one case where he is wrong, based on something like his example d.

For example, here is his **Example b**, \((x-2)^2(x+3)^4+2\):

This has no zeros; the gray dots at the bottom are turning points with \(y=2\). By ignoring the “\(+2\)”, he shifts it down so that it has even zeros at \(-3\) and \(2\). The function as written has three turning points, but no zeros, so the formula would give \(Z-1+E=0-1+0=-1\). Shifting changes it to \(Z-1+E=2-1+2=3\).

Here is the graph of **Example d**:

It has one turning point and one (even) zero, making \(1-1+1=1\). But it also has two non-real zeros, (\(i\) and \(-i\)).

Here is a polynomial like Doctor Rick’s example:

y = (x-1)(x+1)

^{2}+ 3Mr. Kumar would get the right answer: The form (x-1)(x+1)

^{2}has 2 zeros, one of them even, so T = Z – 1 + E = 2 – 1 + 1 = 2.

Here is the graph of the shifted function, \((x-1)(x+1)^2\):

He gets the right answer only because of the unstated twist to his formula, that he allows himself to ignore an added constant, in effect shifting the function up or down to meet some unstated requirements.

But a polynomial can be written in more than one way; if the formula is valid, it should work regardless of how the function is written! We can expand my function to $$y=(x-1)(x+1)^2+3=x^3+x^2-x+2,$$ and then factor it, so there is no additive constant to ignore:

But it can also be expanded and factored as

y = (x + 2)(x

^{2}– x + 1),and if we apply the same formula to this, we see one real zero (and two non-real zeros), giving T = 1 – 1 + 0 = 0. This is obviously wrong.

So he is either making a

totally unsupported claim(which, in mathematics, amounts to lying), oromitting the conditionsunder which his formula would work.

A formula that only works under unstated conditions can’t be called true, because one can’t even *try* to prove it. This is why theorems are always stated as precisely as possible.

Debarghya replied, first to Doctor Rick, and then to my comments on the video:

The first approach to the question ‘polynomial functions of odd order have at least one zero’ is straightforward and I must have understood that. it was my fault. but the approach using fundamental theorem of algebra is new to me. I know this theorem but not in detail. I will study more about this theorem to understand this approach.

Yes, this is the video that I referred to.

There is no solid logic for this formula and hence cannot be trusted.Thanx again for your constant guidance.

I looked a little further, and realized that the error was already known, and an attempt had been made to specify conditions:

I just noticed that in the description of the video, it says,

CAUTION: The formula discussed for the number of turning points

will not work when we have imaginary roots. It always works for Real roots. ThanksIt looks like this results from a discussion in the comments:

It does seem clear that he never made an attempt to

provehis claim, but based it entirely onexamples. This is unfortunate, considering his self-description as having a “Passion of sharing and guiding students to understand mathematics concepts and solve problems using different strategies”.

Here is the graph of commenter Eric’s example:

We could, in principle, rewrite this in a form that Mr. Kumar’s method would work with, by adding 20 (so that the *x*-axis intersects the curve between the turning points), factoring that, and subtracting 20. Unfortunately, the factoring would involve irrational numbers.

It seems clear that Mr. Kumar obtained his formula by extrapolating from a collection of nice graphs (the sort we give students to work with because everything is visible), and didn’t try cases like this where complex roots distort the graph, making extra turns.

But what of his new claim, that the formula always works “for real roots”? Presumably he means, “**when all the zeros of the polynomial are real**“. In Eric’s example, it is easy to see from the equation he gives that there are non-real zeros, since one factor is the irreducible polynomial \(x^2+1\); but the same is true of Example d, where the formula happened to work.

It seems that his real claim is that when a polynomial is written **in factored form, with only linear factors**, the formula applies; and since **adding a constant** doesn’t change the number of turning points, it can also be applied when a constant is added to such a product of factors, by ignoring that constant.

Debarghya agreed:

I have just seen the comments too…and truly surprised… I have to be more cautious while seeing videos…although I always judge every step while seeing any solution and seek your help whenever there is any doubt…but there are so many people out there who regularly see many such videos available free on YouTube…in that respect it has dangerous impact on learning mathematics…

In the comment section I see this part

There are many good teachers on YouTube and elsewhere, but I worry when I see how many students’ first thought on seeing a problem they can’t immediately solve is to search for random people who claim to have answers, rather than sticking with one or two trustworthy teachers. (And even then, the best of us make mistakes from time to time, so your habit of checking is a good one._

I responded:

I hadn’t noticed that comment. It gives me a couple thoughts:

First,

he thought he had derived a valid formula, but what he had done was not a proof, and he apparently didn’t realize that. This is why I tend to avoid answering questions with a definite answer if I haven’t proved it! Even in what I’ve said to you, you may observe that I said, “Ithinkall his examples are correct, but they do not exhaust the possibilities.” That is because I didn’t want to take the time to graph each of his functions and confirm his answers, even though I had thought about each of them and was reasonably sure! I also said, “Idon’t know exact conditionsunder which what he says is correct.” I had some thoughts about that, but …

Mathematicians, at least in principle, tend to have a strong sense of their own fallibility, and avoid making strong statements without proof. (That sometimes bothers my friends in non-mathematical discussions, when I avoid committing myself!)

For some related ideas, see Why Do We Need Proofs?

Second, there are two ways you might use this as a

learning opportunity. One is to ponder how he might have developed his formula from his examples. Why, given some assumptions about the form of a polynomial (in factored form, with linear factors so you know all the real roots) would his formula work? I haven’t considered it in detail (yet), but after trying a couple of the examples, I had a general sense that it was reasonable under some conditions. Can you see that? (If Mr. Kumar hadtold that story, rather than just tell people that his formula is true – in several videos, by the way, not just this one! – then it would have pedagogical value.)

In teaching, we sometimes just want students to learn **facts** (and that seems to be the focus in some educational systems); but what we should be primarily teaching is **how to think**; and telling the story of his discovery, in a way that would also reveal its limitations, would be far more useful than a formula – even if it did always work.

Another thing to do is to try extending the idea (as Eric Bian did, but failed), to determine either what the exact conditions are, or how pairs of complex roots might fit in. It happens that our post John Conway on Thinking and Teaching sort of touches on this issue, and you may find parts of it interesting.

In particular, that post demonstrates how a master teacher told a story (showing his actual thinking as it happened, including the occasional error) rather than giving an answer. But the context is a discussion of how complex roots reveal themselves on a graph.

Debarghya found another of Mr. Kumar’s videos:

Here is Mr. Kumar’s observation

Here he is doing something closer to telling his discovery story, but unfortunately he doesn’t notice that all his examples have turning points separated by zeros, and he ends with “You’ll *always* get the right results.”

I answered:

His thinking, as I expected, is entirely based on

extrapolation from examples, without considering whether his examples coverall possible cases. This is a kind of thinking that is encouraged too often even in American schools (looking for “patterns” without proving them).If I were him, I would have wanted an actual proof, which would turn out to require some explicit conditions. We can see that he is missing the possibility of turning points that are

not between zeros, as in my example, or more than one turning point not separated by zeros; he would (hopefully) have caught that if he had attempted a proof.I might have started my attempted proof something like this:

If a polynomial has Z zeros, of which E have even multiplicity, and T turning points, I claim that T = Z + E – 1.

Proof: Between any two zeros there must be a turning point. If all the zeros had odd multiplicity, then zeros and turning points would have to alternate*, like

z-t-z-t-z-t-z. There can be no turning points beyond the last zero in either direction*, because that would require another zero beyond it. Clearly in this case there will always be one more zero than turning point, so T = Z – 1. But since an even zero is itself a turning point, we need to add the number of these to the number of turning points, resulting inT = Z – 1 + E.I put an asterisk (*) next to two statements there that are

false.

It’s true that the graph must turn (at least once) between any two zeros; but it is possible to have more than one turning point between two zeros, as in Eric’s example, or between the last zero and infinity, as in Doctor Rick’s. My first attempt at a proof would have made me think more fully about all the possibilities, which would help in refining my claim.

This is not yet a genuine proof, so I’d look for ways to prove claims such as “Between any two zeros there must be a turning point.” In thinking about that, I would realize it needs to be changed to “Between any two zeros there must be

at least oneturning point,” and, “Beyond the last zero, there must be aneven numberof turning points.” This allows for arrangements likez-t-z-t-t-t-z-t-t, which violate the two asterisked claims This changes everything; the proof fails in the general case.

I don’t know how general we could make the conditions; but Mr. Kumar is right in his correction, that if we make a strong condition that all zeros must be real, then his formula will work. I’ll leave that to the reader to try proving. (But I have completely convinced myself that it is true before publishing this claim!)

]]>

The question came from Amia in late April:

Hi Dr Math,

Can you help me to solve this question:

Find the intersection points for the two functions f(x) = x

^{n}, g(x) = x^{1/n}, n a positive integer.

One thing that makes this interesting is that we want a general answer for any *n*, not just a solution of a single equation; the behavior of these functions varies with *n*.

Doctor Rick answered:

Yes, we will be glad to help you. As you know from working with us, our focus is on helping

yousolve the problem, which means we need to seeyourideas or thoughts first in order to proceed.

Sketching a few graphsshould give you a good idea of the solution. Try sketching the graphs forn = 1, 2, 3, and 4. Then tell us what you observe, what conclusions you draw, and what questions remain in your mind.

Sketching will take a little thought if you do it by hand (which is worth the trouble!). But I’ve made the graphs he suggests the easy way, using Desmos:

For \(n=1\): \(f(x)=x^1=x, g(x)=x^{1/1}=x\)

(Here, black represents the solution: Since the two functions are identical, every point on the line is a solution!)

For \(n=2\): \(f(x)=x^2, g(x)=x^{1/2}=\sqrt[2]{x}\)

Here, there are only two solutions, 0 and 1.

For \(n=3\): \(f(x)=x^3, g(x)=x^{1/3}=\sqrt[3]{x}\)

Now there are three solutions: \(-1,0,1\).

For \(n=4\): \(f(x)=x^4, g(x)=x^{1/4}=\sqrt[4]{x}\)

Now we’re back to only two solutions.

There is a clear pattern: For **odd n** greater than 1, there are three solutions (\(-1, 0, 1\)), while for

Amia replied, giving an answer, then the algebraic reasoning (rather than making graphs):

When n is odd x = 0, -1, 1

When n is even x = 0, 1

I start with equaling f and g,

x

^{n}= x^{1/n}By raising the two sides to the power n:

(x

^{n})^{n}= xx(1 – (x

^{n})^{n-1}) = 0x = 0 , (x

^{n})^{n-1}= 1If n is odd, x = 1. If n is even, x = -1, 1.

The last line must have a typo (since it doesn’t match the answer given), and the work itself isn’t quite right either. We’ll correct it at the end; for now, see if you can identify the error for yourself. (It’s easy to miss, especially when you are following someone else’s work whom you generally trust; neither Doctor Rick nor I caught this one until I was editing it just now!)

Doctor Rick replied, first commenting on the answer given in the first two lines, and then on the work:

Yes, that’s the solution I find by extrapolating from what I see in my sketches for n = 2 and 3. It’s

not quite complete.I am hesitant to raise both sides to the power n, because I know this can introduce

extraneous roots. It’s fine for odd n, since f(x) = x^{n}is amonotonically increasing functionfor odd integers n; but I’d want tohandle even values of n separatelyif I took this approach.I don’t know how you reached your conclusion from the line above it. What I see there is that

x = 0 OR x

^{n(n–1)}= 1In the second case, the exponent is necessarily even (since either n or n–1 must be even), so I conclude x = 1 or –1 … for any n, odd or even. But an extraneous root has indeed been introduced for even n.

The sort of extraneous roots mentioned here is the second one discussed in Extraneous Solutions: Causes and Cures. For an odd exponent, \(f(x)=x^n\) is **monotonically increasing**, meaning that whenever *x* increases, so does *y*; as a result, it is **one-to-one** and the result of applying it to an equation is **equivalent** to the original equation. But for an even exponent, the function is **not** one-to-one, so that the new equation may be true when the original was not.

Amia’s big error is in factoring \(\left(x^n\right)^n\) as \(x\cdot\left(x^n\right)^{n-1}\), when the former would really be equivalent to \({\color{Red}{x^n}}\cdot\left(x^n\right)^{n-1}\), Doctor Rick either missed, or chose not to point out, that error in the work leading to the line he quoted.

My own thought was to start as you did, but

divide both sides by x:^{1/n}x

^{n}= x^{1/n}[1]x

^{n }/ x^{1/n}= 1x

^{n – 1/n}= 1 [2]Being careful not to gain or lose solutions, I must note that what I did above is

not allowed if x, which will happen if and only if x = 0. Thus I must^{1/n}= 0consider that case separately, and I see that x = 0 satisfies the original equation for any n.

So we have found that \(x=0\) is a solution **for all n**; what else?

Continuing with the x ≠ 0 case, my thought was to write 1 as x

^{0}, and use a property of exponents to conclude thatn – 1/n = 0

But here

x has vanished, and I find the solution n = ±1. Since n is restricted to positive integers, I seem to have proved that(a) x = 0 is a solution of [1] for all positive integers n.

(b) Any real x is a solution of [1] if n = 1.

And that’s all! Whereas your approach (as I interpreted it)

introducedspurious solutions, my method hasmissedsolutions. What did I do wrong?I know where the error lies. Can you see it? It has to do with the conditions on the property of exponents that I used.

Making a mistake can be a teaching opportunity; learning to recognize (subtle) errors is important. So, what did he do wrong?

(Notice that we have found the case Amia missed, namely that all real *x* are solutions when \(n=1\); but we haven’t yet found that \(x=1,-1\) are solutions.)

After Amia replied, unable to locate the error, Doctor Rick added a hint:

Did you notice the

property of exponentsthat I used? I didn’t state it explicitly, but it was this:If x

^{a}= x^{b}then a = b.

Under what conditionson the variables is this true?

Whenever we apply a property, we need to make sure it applies under that conditions that are present.

Amia made an unclear reply, to which Doctor Rick answered:

What I asked you was: Under what conditions on the variables is it true that

If x

^{a}= x^{b}then a = b ?I have looked for a full statement of this property, but many lists of properties of exponents only include those that are useful in simplifying exponential expressions. Here is one place online that I find the full statement:

https://www.varsitytutors.com/hotmath/hotmath_help/topics/solving-exponential-equations

If

bisa positive number other than 1, then b^{x}= b^{y}if and only ifx=y.Thus we can’t apply the property if the base (x for us) is negative, 0, or 1.

This property is the **one-to-one** property of exponentiation: It says that there is only one exponent (with a given base, which in our context is *x*) that will produce any given number (if any). It is **not** true if the **base is 0** (since zero to any (non-zero) power is 0), or if the **base is 1** (since 1 to any power is 1), or if the **base is negative** (we’ll explore this below).

I find an equivalent statement in a textbook:

Other sources may be more vague because they hide the conditions on the base inside their definition of what counts as an exponential function. It’s easy to forget, as he said, because we normally work only with proper exponential functions.

I applied the property with a = n – 1/n and b = 0 (or equivalently, with a = n and b = 1/n). I concluded that n must be 1 or –1. In those cases x could have

anyvalue; otherwise there wasnosolution. But now we see thatthis conclusion applies only to x > 0, x ≠1. Thus here is what I have established so far:Solve

x

^{n}= x^{1/n}[1]where n is a positive integer.

If

n = 1, we have x^{1}= x^{1}andevery xis a solution. (This is the part you have missed.)Otherwise, no positive x other than 1 could be a solution.

Checking

x = 0andx = 1, we find that both are solutions for any n.

We saw that missing case in our first graph above. Special cases are easy to miss when you are doing algebra, because we get used to assuming that variables have typical values, not exceptional values!

We still haven’t settled the question of

negative x. The trouble with negative bases is that a non-integer power of a negative number is in general complex (and not single-valued). But here, we know x^{n}is real, so for any solution, x^{1/n}must be real. For negative x, this will only be the case if n is odd. This establishes that, foreven n(except 0), theonlytwo solutions of equation [1] are0 and 1.

For even *n*, \(x^{1/n}=\sqrt[n]{x}\) will be undefined (as far as a graph is concerned) for negative *x*; for example, \((-4)^{1/2}=\sqrt[2]{-4}=2i\), which is not a real number, and so can’t be graphed. That is, the domain of this function is restricted to non-negative real numbers. So we can ignore the negative case for even *n*; we’ve already found all solutions.

But whereas raising both sides of an equation to an even power yields extraneous roots, an **odd** power is well behaved, with its domain including all real numbers:

For

odd n, we can do as you did — because, as I said earlier, f(x) = x^{n}is amonotonically increasing functionif n is odd. We raise both sides of equation [1] to the nth power, obtaining(x

^{n})^{n}= (x^{1/n})^{n}x

^{2n}= xx

^{2n}– x = 0x(x

^{2n–1}– 1) = 0

x = 0or x^{2n–1}= 1Since 2n–1 is odd, in the second case x is a real odd root of unity, namely

1 or –1.This is all rather ugly, compared to simply sketching graphs, which gave us the solution quickly and cleanly.

Unfortunately, Doctor Rick has made a mistake similar to Amia’s: \((x^n)^n\) is not \(x^{n+n}=x^{2n}\), but \(x^{n\cdot n}=x^{n^2}\), just as \(\left(x^\frac{1}{n}\right)^n=x^{n\cdot\frac{1}{n}}=x^1=x\). So the work here should be $$(x^n)^n=\left(x^\frac{1}{n}\right)^n\\x^{n^2}=x\\x^{n^2}-x=0\\x\left(x^{n^2-1}-1\right)=0\\x\left(x^{(n+1)(n-1)}-1\right)=0$$ So either \(x=0\) or \(x^{(n+1)(n-1)}=1\). Since we are assuming *n* is odd, the exponent is **even**, And there are two even roots of \(1\), namely \(1\) and \(-1\). (The only real *odd* root of unity is unity.)

So we can conclude that for even *n*, the solutions are \(0,1\), and for odd *n*, the solutions are \(-1,0,1\); while for \(n=1\), all *x* are solutions. This is what we saw in the graphs.

That may feel like more work than we needed; but it’s good exercise.

For a question with interesting relationships with this one, see When Can a Function Equal Its Inverse?

]]>

We’ll start with this question from 2000:

Box and Whisker Plots I don't understand box and whisker plots. All I know is that a box and whisker plot is used to display data. I can't find information on this anywhere else.

Doctor TWE answered:

Hi Ramiro - thanks for writing to Dr. Math. Abox-and-whisker plot(often simply called abox plot) is a graphical way of showing data. It is useful for quickly findingoutliers- data points out of line with the rest of the data set. Suppose we want to construct a box plot of the following test scores: 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100 If they're not already in numerical order, it's best to arrange them in ascending order.

This is needed in order to find quartiles. (As we’ve seen, one way to put them in order is to construct a stem-and-leaf plot.)

First, we need toconstruct the "box."To do so, we must find the upper and lower quartiles and the median. Themedianis the number in the middle of our set (when arranged in numerical order). Theupper and lower quartilesare the values 1/4 of the way from the top or bottom of our set. In our example: 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100 ^ ^ ^ L.Q. Median U.Q.

We’ll discuss how to find quartiles below.

To draw thebox, we'll put a scale on the x-axis and draw a box from the lower quartile to the upper quartile. We'll add avertical lineto mark the median, like so: LQ M UQ +-------+ | | | +-------+ ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110 where LQ = Lower Quartile, M = Median, UQ = Upper Quartile.

So the box contains the **middle half** of the data, with a wall at the very middle, separating the second and third quarters of the data. It’s width is called the **interquartile range**.

Now weadd "fences."First, we compute theinter-quartile range(IQR). The IQR = UQ - LQ. So in our example IQR = 85 - 77 = 8. Theinner fencesare1.5*IQRbelow the L.Q. and 1.5*IQR above the U.Q. For our example, the inner fences are at: 77 - 1.5*8 = 77 - 12 = 65 and at 85 + 1.5*8 = 85 + 12 = 97 We'll mark these with a dotted line (I'll use colons ":"). Sometimes the fences are not drawn on the box plot, but we'll put them in so we can see where they are: LIF LQ M UQ UIF : +-------+ : : | | | : : +-------+ : ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110 where LIF = Lower Inner Fence, UIF = Upper Inner Fence.

Here the distance from LQ to LIF, and from UQ to UIF, is 1.5 times the distance from LQ to UQ.

There is also a set ofouter fences. These are3*IQRbelow the L.Q. and 3*IQR above the U.Q. For our example, the outer fences are at: 77 - 3*8 = 77 - 24 = 53 and at 85 + 3*8 = 85 + 24 = 109 We'll mark these with another dotted line. These are always twice as far out as the inner fences. Here's what we have so far: LOF LIF LQ M UQ UIF UOF : : +-------+ : : : : | | | : : : : +-------+ : : ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110 where LOF = Lower Outer Fence, UOF = Upper Outer Fence.

Here we went that same distance further to make the new fences. You might think of the fences as something like the outfield fence in a baseball stadium that marks automatic home runs.

Now we add the "whiskers." Find the first value above (to the right of) the Lower Inner Fence. Mark it with an X and draw a line connecting it to the box. Similarly, find the first value below (to the left of) the Upper Inner Fence. Mark it with an X and draw a line connecting it to the box as well. In our example, the end values for our whiskers are at 73 (the first value above 65) and 95 (the first value below 97.) Our plot now looks like this: LOF LIF LQ M UQ UIF UOF : : +-------+ : : : : X---| | |---------X : : : : +-------+ : : ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110

What we’ve done here is to include the rest of the data as “whiskers” projecting from the box, but cut off any part of the whiskers that would extend past the inner fences.

Finally, we have to mark theoutliers. Values between the inner and outer fences are called "suspect outliers." We mark them with an asterisk "*". Values outside the outer fences are called "highly suspect outliers." We mark them with an "o". In our example, we have two suspect outliers: the 60 and the 100. We also have one highly suspect outlier: the 50. Once we mark these on our plot, we're finished: LOF LIF LQ M UQ UIF UOF : : +-------+ : : o : * : X---| | |---------X : * : : : +-------+ : : ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110

We’ll discuss later what makes these outliers important.

We could "erase" the fences and labels, but I'd probably leave them in so that the person looking at the graph can see where they are. If we erase them, we'll have: +-------+ o * X---| | |---------X * +-------+ ^.........^.........^.........^.........^.........^.........^ 50 60 70 80 90 100 110

I’ve seen introductory presentations that omit the distinction of outliers, and therefore don’t mention fences.

As you can see, this plot quickly gives an idea of what our data look like. Half the numbers are between 77 and 85, the middle of the data set is at 83, the "reasonable" range of the data goes from 73 to 95, and we have three suspect data values at 50, 60, and 100. A nice feature of this kind of plot is that all the computations are relatively simple. We never had to do anything more than add, subtract, and multiply by 1.5 and 3.

Here are some examples of actual box-and-whisker plots, to show some of the variation in style:

R:

Another question, from a teacher in 2000, asked for details on the calculation of the quartiles:

Box and Whisker Plots Dear Dr. Math, My 7th-grade students and I are drawing box and whisker plots. I am looking for confirmation on the placement of the first and third quartiles. If there is an odd number of data, is the median considered to be part of the subgroup used to find the upper (or lower) quartile? It seems reasonable to do so if that makes the subgroup an odd number of data, and not to do so otherwise. Is that right? Example: 1, 2, 3, 4, 5 The median is 3; should the lower quartile be 2 or 1.5?

If we take the lower quartile as the median of 1, 2, 3, including the median, we get 2; if we take it as the median of 1, 2, then we get 1.5. Such uncertainty arises when you have very small datasets like this one, and is not quite so troublesome in real life.

The definition and calculation of quartiles varies among textbooks; both of those issues are mentioned in our post The Many Meanings of “Quartile”. The book used below is called “M&S” in the answer by Doctor TWE shown there.

Doctor TWE answered this, too:

Hi Drew - thanks for writing to Dr. Math. According to _Statistics for Engineering and the Sciences_, W. Mendenhall and T. Sincich, 1995:For small data sets, given n data points A_1 to A_n, the lower and upper quartiles are calculated as follows: 1. Calculate l = (1/4)(n+1), round to the nearest integer (if l falls halfway between two integers,round UP) 2. A_l is thelower quartile. 3. Calculate u = (3/4)(n+1), round to the nearest integer (if u falls halfway between two integers,round DOWN) 4. A_u is theupper quartile.

The notation here means that “*l*” is the index of the lower quartile, and “*u*” is the index of the upper quartile, in the sorted data set.

So for your example, n = 5 and l = (1/4)(5 + 1) = 1.5 -> 2, thus LQ = A_2 = 2 u = (3/4)(5 + 1) = 4.5 -> 4, thus UQ = A_4 = 4

That is, we round 1.5 up and use the 2nd data point for LQ, and we round 4.5 down and use the 4th data point for UQ.

Note that with this definition, the upper and lower quartiles arealways one of the data points(thus, they could not be 1.5 and 4.5 for your example). This differs from the median, which is the average of the middle two if n is even, and thus might not be one of the data points. The reason for the "inconsistency" in rounding quartiles that fall halfway between two integers (i.e. they end in .5) is to achieve symmetry. If we rounded up on both quartiles when they ended in .5, for your example data set we'd have LQ = 2 and UQ =5, and these are not equidistant (in terms of number of data points) from the median or the extreme data points.

Note that this is ultimately an arbitrary choice, which is why different authors make different choices. My preference, expressed in the post I referred to, is equivalent to his for this example, and to Drew’s preference to include or exclude the median in each half in such a way as to make it an odd number of values.

Consider our example data: $$50,60,73,77,80,81,82,83,84,84,84,85,88,95,100$$

Using Drew’s approach, we first find the median, the 8th of the 15 data values: $$50,60,73,77,80,81,82,{\color{Red}{\mathbf{\underline{83}}}},84,84,84,85,88,95,100$$ Then we take the median of the first and the last 7 (not 8) values: $$50,60,73,{\color{Green}{\mathbf{\underline{77}}}},80,81,82,{\color{Red}{\mathbf{\underline{83}}}},84,84,84,{\color{Green}{\mathbf{\underline{85}}}},88,95,100$$

Using the M&S method, \(n=15\), so we calculate $$l=\frac{1}{4}(n+1)=\frac{15+1}{4}=4,$$ which doesn’t need rounding; the 4th value is \(A_4=77\). Similarly, we calculate $$u=\frac{3}{4}(n+1)=\frac{3(15+1)}{4}=12;$$ the 12th value is \(A_{12}=85\). So both methods produce the same result here; this is an easy case where all methods tend to agree.

Yet another question, from another teacher in 2000, goes a little deeper:

Outliers in a Box-And-Whisker Plot I am teaching box and whisker plots to my seventh grade students. We have calculated 1.5 times the IQR and added it to the upper quartile and subtracted it from the lower quartile.If any data is beyond those points, it is an outlier.The question is, "would it be an outlier if the point wereequalto 1.5*IQR away from one of the quartiles? My instincts tell me that it would not be; that the point would need to be farther away than that. This appears to be confirmed by the TI-83 calculator, since it does not graph the point as an outlier. However, I used a worksheet for an assignment that had only one point that was exactly equal to 1.5*IQR away, and none farther away. The first question on the page asks them to identify the outlier. This implies that it would be an outlier. I have looked in every book I have, searched your archives and the archives of other math sites online and can't find a clarification anywhere. They all explain how to find them, but aren't specific enough to answer my question.

Doctor TWE answered yet again:

Hi Bob - thanks for writing to Dr. Math. Outliers are data points that are outside the range of the data values that we want to describe. Outliers can be due to anerror in measurementof the value, a value from adifferent population, or simply arare chance event. In any case, where we draw the line for "outside" is somewhat arbitrary. (Why 1.5*IQR? Why not 2*IQR? Or sqrt(10)*IQR? Or (pi/2)*IQR?) When taking statistical measurements, there is a "gray area," and methods for finding outliers are simply guidelines to help us find errant points - they'renot intended to be absolute. The farther away from the mean a data point is, the more suspect it is. I would describe a data point that is exactly 1.5*IQR away from the Quartiles as a "borderline outlier." The idea is to recognize that these data points are more likely to be "tainted."

A value doesn’t suddenly become suspect when it reaches the fence; we choose arbitrarily to put the fence there!

The method you describe is, in fact,only one way of finding outliers. Another method is to define an outlier as any data point where the absolute value of the z-score is greater than 3 (i.e. it lies more than 3 standard deviations away from the mean). This definition would create a different set of boundaries for outliers.

For our example data, the mean is 80.4, and the standard deviation is 12.3, so we would reject anything below \(80.4-3\cdot12.3=43.5\), or above \(80.4+3\cdot12.3=117.3\); so by this standard there are no outliers.

Incidentally, my college statistics textbook (_Statistics for Engineering and the Sciences_, 4th edition; W. Mendenhall and T. Sincich; Prentice-Hall; 1995) describes suspect outliers as "observations that fall between the inner fences and the outer fences," where inner fences are defined at 1.5*IQR and outer fences are defined at 3*IQR. To me, this implies that a data point exactly on the inner fence would not be considered a suspect outlier (since it is not "between" the fences). But then it proceeds to describe highly suspect outliers as "observations that fall outside the outer fences." But how then are we to interpret a data point that falls exactly on the outer fence? It is, strictly speaking, neither "between the fences" nor "beyond the outer fence." [Perhaps we can call it a "somewhat highly suspect outlier."] An important thing to note is that in the end-of-chapter summary, it describes both of these as "rules of thumbfor detecting outliers."

In helping statistics students, I run across a number of these rules of thumb, and have to tell them to use whatever rule their book uses. This is true, for example, of deciding whether a sample is large enough to assume a normal distribution.

The bottom line: The reliability of data points in our data set is not an "all-or-nothing" situation, but rather colored in shades of gray. Where we choose to "draw the line" is somewhat arbitrary and can be determined using different methods. So pick a method, and just be consistent.

And if you are in a class, or using a textbook, let the author or teacher pick the method!

Taking that idea further, here is a question from 2001:

Outliers What is thedefinition of outlier?

Doctor Mitteldorf answered:

Dear Mrs. Ben-Ami, For a variety of definitions of "outlier," you can use a searcher like Google to look for the words definition outlier. You'll find definitions like these: 1) Outlier - a data point that is an"unusual" observationand likely should bediscarded. Note: The median is less affected by outliers than is the mean. 2) A number that is far apart from the rest of the data; anextreme valueeither much lower or much higher than the rest of the values in the data set. Outliers are known to skew means or averages.

The first definition emphasizes the “suspect” nature of an outlier: that it might be bad data that shouldn’t be used. The second focuses only on its being extreme, without suspicion. Both are valid.

But I'm afraid you've unearthed an embarrassing secret of the statistical trade:An outlier is a point which your data set is better off without.If you can prove your point better by ignoring some small portion of your data, why not ignore it? It's probably just a blunder on the part of the person collecting data, or some special, irrelevant circumstance that we needn't investigate in detail. There isno rigorous definitionof an "outlier," and generations of statisticians have made their employers' data look better than they really are byselectively eliminating from analysis inconvenient data points.

That is, the subjective nature of the decision to reject an outlier makes it an easy place to hide bias, and make the answer fit your expectations or preferences. That isn’t to say that it is only that.

Having said all that -there is some justificationfor the concept. Usually, there are many small sources of difference that together cause data to be scattered in a recognizable pattern, and from analyzing that pattern, you can conclude a great deal both about the difference and about the average properties of the data. And it's often true that, in a large data set,something odd happens to a few of the measurementsthat doesn't happen to the rest. It can be as simple as reading the meter wrong, or that some process was inadvertently left incomplete at a few of the sites. You look at the data and they fall into a smooth and regular pattern except for a few points that stick out and make you wonder what happened. So the concept of an "outlier" and the reason for eliminating them from a data set before analysis are both legitimate; it's just that the process of recognizing outliers liesoutside of any objective, mathematical process, and is thus subject to easy abuse. Statistical analysis is sometimes done today by pure scientists whose only motive is to seek truth, but more often it is done on contract to organizations that have much at stake in the outcome. There is pressure to make the analysis come out in one direction, and the selective elimination of "outliers" is a favorite tool for justifying the distortion of science by political ideology or economic interest or even a theoretical bias of the scientist himself.

So it’s good to recognize outliers and decide how to deal with them; but be careful! Don’t let outliers be an excuse for twisting the data, by getting rid of the data points that you wish weren’t there.

]]>

We’ll start with a question from 1997:

Stem-and-leaf Graph or Stemplot Hi! I was doing a math-a-thon and I got a problem about astem leaf graph. I am in the advanced math class. My math teacher said it would take two days to teach his advanced class how to do it. Can you help?

Doctor Chita answered:

Hi Scott: Sure, I can try to help. Astem-and-leaf graph, also called astemplot, is a way to represent the distribution of numeric data. It was invented by John Tukey, a mathematician, and is a quick way to picture data for numbers that are greater than 0. I'll explain using an example.

Tukey’s “exploratory data analysis” is used to visualize data by hand, when there are not too many numbers; the plot looks much like a histogram, showing the “shape” of the data at a glance, but includes the actual data values. It can also be used as a trick for sorting data, as we’ll see. (It *can* actually be used with negative data, but we rarely see that.)

Suppose you have the following set of numbers (they might represent the number of home runs hit by a major league baseball player during his career). 32, 33, 21, 45, 58, 20, 33, 44, 28, 15, 18, 25 Thestemof a stemplot can have as many digits as needed, but theleavesshould contain only one digit. To create a stemplot to display the above data, you must first create the stem. Since all of the numbers have just two digits, start byarranging the tens digitsfrom smallest to largest. 1 2 3 4 5

Usually we will be dealing with two-digit numbers; sometimes we need to round in order to have only two digits, and often we need to work around a decimal point, as we’ll see. We think of each number as consisting of a “leaf”, the last digit, that identifies the individual number, and the rest of the number as a “stem”, by which the numbers are grouped.

To create theleaves, draw avertical barafter each of the tens digits and arrange theones digitsfrom each number in the data set in order from smallest to largest. If there are duplicate numbers, like 33, list each one. 1|58 2|0158 3|233 4|45 5|8 The shape of the resulting display looks something like a bar graph oriented vertically. By examining the stemplot, you can determine certain properties of the data.

For example, to plot the first number, 32, we put its leaf, 2, to the right of its stem, 3. Commonly we will initially place the leaves in the order they arrive, which for our example of 32, 33, 21, 45, 58, 20, 33, 44, 28, 15, 18, 25 will produce this:

1|58 2|1085 3|233 4|54 5|8

For some purposes, it can be left unsorted like this; but for the uses to which we will put it, we need to sort the leaves on each stem, as he did above:

1|58 2|0158 3|233 4|45 5|8

In doing this, we have sorted all the numbers, which we can read back out as

15, 18,

20, 21, 25, 28,

32, 33, 33,

44, 45,

58

Now we can put the plot to use.

You can find themedianby counting from either end of the stemplot until you find its center. Here, since there are 12 numbers, the center lies between 28 and 32. The median is the average of the two data points: (28+32)/2 = 30.)

Here I have colored the leaves in spectrum order as I crossed them out, working from each end, and ending with the two middle numbers in **bold**:

1|~~58~~2|~~015~~83|2~~33~~4|~~45~~5|~~8~~

Here we can see the middle numbers, 2**8** and 3**2**. The median is their average. (If there had been an odd number of values, we would have found one middle number, which would be the median.)

We can also just count the total number of data values, \(2+4+3+2+1=12\), and count 6 from one end (left to right, top to bottom) and 6 from the other end (right to left, bottom to top) to find the middle:

```
---->
1|58
2|0158
```**|**
3|233
4|45
5|8
<----

Both approaches use the fact that the leaves represent all the data values listed in order, making this a shorthand for the complete sorted list.

You can also determine if there is amodein the data set by looking at the plot. Here, the number 33 is the mode since it is the only value that occurs more than once.

We can determine this simply by looking in each row for duplicate digits:

1|58 2|0158 3|2334|45 5|8

In general, there could be no mode, or several. See Three Kinds of “Average”.

If your data containthree-digit numbers(like batting averages, for example), you can use the same technique. For example, let's assume the data are 298, 303, 285, 311, 225, 315, 250, 305 Ignore theones digitsin each number (these will be the leaves) and look at theremaining two digitsin each number (the hundreds and tens digits). The stem will begin at 22 because the smallest number in the data set is 225. The stem will end at 31 because the largest number is 315. Include the two-digit numbers between 22 and 31 in the body of the stem.

It’s important to note that even stems with *no* leaves are to be included (see below), in order to accurately reflect the shape of the entire distribution. This is why we first find the smallest and largest numbers and list all stems between them, rather than just writing them as we find them.

Once you have the stem, thenlist the ones digitsin each number after the corresponding two-digit number before it. The stemplot will look like this, with no leaves after the numbers without a corresponding value. 22|5 23| 24| 25|0 26| 27| 28|5 29|8 30|35 31|15 If these data represent the batting averages for a particular player, this display indicates that he has had a very successful career - most of his averages are clustered between 280 and 320.

If the numbers were more widely scattered (e.g. from 225 to 791, with 58 stems from 22 to 79, rather than just ten), this method would not work well, and we would probably round to the nearest ten, so that the stems would have only one digit.

One thing not mentioned here is that we often find a decimal point in the data, which we ignore; the plot above could just as well have represented the data 0.298, 0.303, 0.285, 0.311, 0.225, 0.315, 0.250, 0.305, or the data 2.98, 3.03, 2.85, 3.11, 2.25, 3.15, 2.50, 3.05. For this reason, it is common to include a “key” to explain the interpretation. For the original set of data, this might look like

Key: 29|8 = 298

For the others, it might be

Key: 29|8 = 0.298 Key: 29|8 = 2.98

A 1996 question fills in a little gap:

Stem and Leaf Plots Dear Dr. Math, I am in the Math Counts math competition, and when doing practice problems we came across this problem: Use thestem-and-leaf plotof the recent art project scores tofind the mean score. Express as a decimal. 5 | 0 0 4 | 9 7 3 3 1 3 | 8 7 2 | 9 What in the world is a stem-and-leaf plot? Thank you very much, Molly

Here, rather than starting with data and *making* a stemplot, we are given one and asked to *interpret* it. (Note that the stems here are given in reverse order.) Doctor Robert answered, not giving a full explanation, but focusing on how to find the mean:

Stem and leaf plots are a way that statistician can look at the distribution of numbers given to them to analyze. For example, in the stem-and-leaf plot you show, there were two scores in the 50's (They were both 50), 5 scores in the forties (49, 47, 43, 43, 41), two scores in the thirties (38, 37) and one score in the twenties (29). So all of the art scores were50, 50, 49, 47, 43, 43, 41, 38, 37, and 29. You can find the average score by adding them and dividing by 10.

So the mean is just $$\frac{50+50+49+47+43+43+41+38+37+29}{10}=\frac{427}{10}=42.7$$

The mean doesn’t fit as well into this format as the median and mode; here we are just extracting the original data and finding their mean, rather than using the numbers as displayed. I’ll suggest a possible alternative below.

One last question, from 2002, will provide a useful review.

Mode, Mean, and Median in Stemplots I'm trying to help my 6th grader do homework. How do I find a "mode," "mean," and "median" using a stem/leaf plot? Problem: stem leaf 1 889 2 035579 3 138 4 235

Doctor TWE answered:

Hi Linda - thanks for writing to Dr. Math. Each stem-and-leaf combination represents a data point in our set. So to find the mode, mean, and median of the set, we have to figure out how to interpret their definitions for this type of representation.

Presumably the student this time knows how to make and read a stemplot, which in this example represents the data $$18,18,19,20,23,25,25,27,29,31,33,38,42,43,45$$

Themodeis defined as the data value that occursmost often. So we are looking for the leaf (number) that occurs the most often on one stem of the diagram. In your example, there are two 8 leafs on the 1 stem (i.e. two data points of value 18), and two 5 leafs on the 2 stem (i.e. two data points of value 25). So the data set is"bi-modal" with modes of 18 and 25. Note that I did not count the 5 leaf on the 4 stem because it represents a different value (45) - it just happens to have the same last digit as my mode of 25. I similarly did not count the 8 leaf on the 3 stem, nor the three different 3 leaves.

This is important: Digits on different stems represent different numbers, so we are not counting identical digits, but identical digits *on the same stem*. The two 9’s do not represent the same number, so we ignore them. Here, the two modes are in red and in green:

1889 2 035579 3 138 4 235

$$\mathbf{{\color{Red}{18,18}}},19,20,23,\mathbf{{\color{DarkGreen}{25,25}}},27,29,31,33,38,42,43,45$$

Themeanis the conventional "average," and perhaps the best way to find this is to do it the conventional way -add the values and divide by the number of numbers. With the stem-and-leaf plot, that means that we'll have to "read" each stem-and-leaf as a conventional number. For your example we'll get: (18+18+19+20+23+25+25+27+29+31+33+38+42+43+45) / 15 = 436/15 = 29.1 (Do you see how I got the numbers I added?)

We could, instead, add all the leaves, then add the sum of each stem digit multiplied by its number of leaves, in order to more directly use the stemplot format: $$(8+8+9+0+3+5+5+7+9+1+3+8+2+3+5)+3(10)+6(20)+3(30)+3(40)=\\76+[30+120+90+120]=76+360=436$$ I haven’t seen this done, though!

We can also observe that the mean is located in the middle of the data, as indicated by the asterisk:

```
1 889
2 035579
```*****
3 138
4 235

Themedianis themiddle valuein the set. This is relatively simple. Start crossing off pairs of high and low leaves. Start with the leftmost leaf on the bottom stem and the rightmost leaf on the top stem. When you only have one (or two) leaves left that have not been crossed out, that value (or the average of the two values) is the median. In your example (I'm using matching symbols to show which two were crossed out as a pair): stem leaf 1 X*# 2 -+=@7@ 3 =+- 4 #*X The one I'm left with is the 7 leaf on the 2 stem, so the median is 27.

That is, using the coloring scheme I used above,

1~~889~~2~~0355~~7~~9~~3~~138~~4~~235~~

In real life we would just mark digits in the order I did here, crossing them off or underlining. And the process is just what we do when the data are all written out: $$18,18,19,20,23,25,25,\mathbf{27},29,31,33,38,42,43,45$$

]]>Last week we looked at how the **adjugate** matrix can be used to find an inverse. (This was formerly called the [classical] **adjoint**, a term that is avoided because it conflicts with another use of the word, but is still used in many sources.) I posted that as background for the question we’ll look at here.

This question came from Rohit, in early March:

In our textbook, it is mentioned that while solving system of linear equation by matrix method, if |A| = 0, we then calculate (adj A)B. But why?

How calculating (adj A)B tells us solution of system of linear equation??

This image appears to come from an NCERT textbook from India (chapter 4.7.1, page 32 of the pdf). The topic is solving a system of equations:

We saw last week how the inverse of a matrix can be used to solve a system of equations, and that the inverse of an invertible matrix can be found (somewhat inefficiently) as \(\frac{1}{\det(\mathbf{A})}adj(\mathbf{A})\), where \(adj(\mathbf{A})\) is the adjugate (or “adjoint”) matrix, the transpose of the matrix of cofactors. This is how the inverse is found in this book.

(Note that the book specifies **A** as a 3×3 matrix, and **X** and **B** as 3×1 column vectors (matrices), representing three equations in three unknowns. The general idea of using the inverse matrix applies more broadly, but in this context, using the adjugate to find the inverse is not entirely unreasonable. For larger matrices, as we saw, the complexity of the calculation becomes prohibitive.)

Here is Case I, in which the system has a unique solution:

But in Case II, when the determinant is 0, we can’t divide by it to find the inverse; instead, the book tells us in effect to skip the division, and just **multiply by the adjugate itself**. How does this tell us anything?

Doctor Fenton answered:

Hi Rohit,

If A is a square matrix, adj(A) is the transpose of the cofactor matrix of A. If A is

singular, Ax = b must either beinconsistent, or consistent but havinginfinitely many solutions, so the second statement, that if(adj A)B = 0, then A must be either inconsistent or have infinitely many statements,just repeats the previous sentenceand there is nothing to prove.

That is, the last case listed just says we’ve learned nothing beyond what we know merely because the determinant is zero. This test doesn’t apply, and we need some other way to determine whether there are any solutions. (But we aren’t told about any alternative test.)

The only content is the first statement, that if(adj A)B ≠ 0, thenA must be inconsistent. It’s not hard to show this with examples, butproving it in general is more difficult that I originally thought. About the only information available on (adj A) is that for an invertible matrix A,

A(adj A) = (adj A)A = |A| I,where I is the identity matrix, so that

A.^{-1}= (1/|A|)(adj A)

In fact, it is true that \((adj \mathbf{A})\mathbf{A}=\det(\mathbf{A})\mathbf{I}\) regardless of whether **A** is invertible; this was shown last time, though without emphasizing this case. In the product on the LHS, every element on the main diagonal is the sum of elements of the corresponding column of A with their cofactors, and is therefore equal to the determinant; every element off the diagonal is 0.

I suspect his initial difficulty in proving the claim comes from looking at the adjugate from a modern perspective, focusing on row reduction methods; we’ll be proving it later.

If A is

singular, and B is in therange(orcolumn space) of A, then examples show that (adj A)B = 0, but if B isnotin the column space, then examples show that (adj A)B ≠ 0, butI don’t see how to prove this.

The column space is the set of column matrices **B** that can be obtained by multiplying **A** by any column matrix **X**. So if **B** is in the column space, the system has at least one solution.

It is very easy to get one’s mind turned around here. The claim to be proved is not that if \(\mathbf{A}\mathbf{X}=\mathbf{B}\) has a solution, then \(adj(\mathbf{A})\mathbf{B}\ne\mathbf{0}\), but that if \(adj(\mathbf{A})\mathbf{B}\ne\mathbf{0}\), then \(\mathbf{A}\mathbf{X}=\mathbf{B}\) has a solution. We’ll see such a mistake in a textbook, below! And it took me forever to get *my* head around all this!

When A is

non-singular, then if we write the n×2n augmented matrix [A : I] and reduce the left half toreduced row echelon form, then the right half becomes A^{-1}, which is (1/|A|) (adj A), but if A issingularand the left side is put in reduced row echelon form, the right side isnot a scalar multiple of (adj A).

This is how we find an inverse by row reduction (as we also saw last week); here we see what the failure to find an inverse looks like in terms of the adjugate:

For example, if

[ 1 2 3 ] [ 3 3 -3 ] A = [ 4 5 6 ] , then adj A = [-6 -6 6 ] [ 5 7 9 ] [ 3 3 -3 ](row 3 is the sum of the first two rows, so

A is singular), then row-reducing[ 1 2 3 : 1 0 0 ] [ 1 0 -1 : 0 7/3 -5/3 ] [ 4 5 6 : 0 1 0 ] gives [ 0 1 2 : 0 -5/3 4/3 ] [ 5 7 9 : 0 0 1 ] [ 0 0 0 : 1 1 -1 ] ,so the right half is not a scalar multiple of adj A.

We didn’t obtain **I** on the left, so what we have on the right is not an inverse (there is none). Now, what if we make a consistent system with this matrix **A**?

However, [ 1 1 1 ]

^{T}is a solution of[ 1 2 3 ][ x ] [ 6 ] [ 4 5 6 ][ y ] = [ 15 ] [ 5 7 9 ][ z ] [ 21 ],and we find that

[ 6 ] [ 0 ] [ 6 ] [ 3 ] (adj A) [ 15 ] = [ 0 ] while (adj A) [ 15 ] = [ -6 ] [ 21 ] [ 0 ] [ 20 ] [ 3 ] ,so this example bears out the claim of the second statement.

In the first example, with \(\mathbf{B}=\begin{bmatrix}6\\15\\21\end{bmatrix}\), \(adj(\mathbf{A})\mathbf{B}=\mathbf{0}\), and there are infinitely many solutions, one of which is \(\begin{bmatrix}1\\1\\1\end{bmatrix}\); this is compatible with the second statement, which says we can’t be sure.

In the second example, with \(\mathbf{B}=\begin{bmatrix}6\\15\\20\end{bmatrix}\), \(adj(\mathbf{A})\mathbf{B}\ne\mathbf{0}\), and there is no solution, as implied by the first statement.

But is this **always** so? We don’t know yet. In practice it doesn’t really matter:

However,

I don’t see why anyone would want to use this criterionto determine that AX = B is inconsistent. Computing the adjugate matrix is an enormous amount of work: if A is n×n, adj A requires computing n^{2}(n-1)×(n-1) determinants. That is far more work than simply using Gaussian elimination.

If we used the adjugate to find the inverse (which is not as hard in the 3×3 case as for larger matrices), then we have it sitting around to use in this test. But using Gaussian elimination to solve the system is little more work than using it just to find the inverse, and it gives us a simpler criterion for inconsistency.

We’ll set aside for now the question of whether the test as stated is true, because Rohit’s question turns out to be focused on a particular point.

After a technical problem disrupted the thread, Rohit restated the question, making his issue clearer:

I did not understand

third point. In some book, it is written that if |A| = 0 and(adj A)B = 0the system gives us infinite many solution i.e. it isconsistent, but in our textbook it is written that itmay be consistent or inconsistent. Please clarify this.

This gives us a little more context. The new image appears to come from a similar textbook, namely the last page of this pdf. This version does not explicitly mention, in case (iii), that “consistent” means “infinitely many solutions” (which I describe as “*too* consistent”); but otherwise it is equivalent to the first.

Rohit also restated the question a third time in another thread, due to the technical problems, quoting yet another source, apparently quoting from StackExchange, with a different numbering of the same cases:

For a system of equations in matrix form AX = B,

If |A| ≠ 0, there exists a unique solution X = A

^{−1}BThat is fine, I understand this.

If |A| = 0,

Case 1: (adj A)⋅B ≠ O

then solution does not exist and the system of equations is called

inconsistent.Case 2: (adj A)⋅B = O

then system may be either

consistent or inconsistentaccording as the system have eitherinfinitely many solutions or no solution.My doubt is: If |A| = 0 and (adj A)b ≠ 0 then the system is

inconsistent clearly. If |A| = 0 and (adj A)b = 0 how can I reach the conclusion that “systemmay or may not be consistentaccording as the system have either infinitely many solutions or no solution” from |A|x=(adj A)b.I understand case 1 but

I do not understand case 2. Please help.

This narrows our focus to the last case. He sees this as a contradiction between books. As I read them, all three say the same thing; we haven’t yet seen “some book” that says that if \((adj\mathbf{A})\mathbf{B}=\mathbf{0}\), there must be infinitely many solutions. (We will.)

Doctor Rick responded:

I see that you want to focus on case 2. Doctor Fenton did answer this: He said,

If A is a square matrix, adj(A) is the transpose of the cofactor matrix of A. If A is singular, Ax = b must either be inconsistent, or consistent but having infinitely many solutions, so the second statement, that if (adj A)B = 0, then

A must be either inconsistent or have infinitely many solutions, just repeats the previous sentence and there is nothing to prove.In other words, the statement in Case 2 merely

repeats what we already know, that the system must be either consistent or inconsistent; it adds nothing new that we would need to prove. Let me put that another way: The claim is thatIf |A| ≠ 0 then the system AX = B has a unique solution.

If |A| = 0 and (adj A)B ≠ 0 then the system is inconsistent and has no solutions.

If |A| = 0 and (adj A)B = 0 then

there is insufficient information to tellwhether the system is inconsistent or has infinitely many solutions. These cases must be distinguished by other means.

The real issue seems to be what he said in the second thread, which can be answered by examples:

You also showed that

another source saysthat if |A| = 0 and (adj A)B = 0 the system hasinfinitely many solutions, thus disagreeing with what was said above. Is this what you are asking about also — which claim is correct?If so, consider two very simple examples, in which A is reduced to diagonal form:

1. [1 0 0] [x] [3] [0 1 0] [y] = [2] [0 0 0] [z] [0] 2. [1 0 0] [x] [3] [0 0 0] [y] = [2] [0 0 0] [z] [1]In each case, (a) Does the system AX = B have a

unique solution, or aninfinite number of solutions, or is itinconsistent(no solutions)? (b) What are|A|and(adj A)B?

These examples will demonstrate that, when \((adj\mathbf{A})\mathbf{B}=\mathbf{0}\), both outcomes are possible.

Rohit carried out the work for each case:

Sir, what you said in the last, I solved it:

So the two examples show that when \(|\mathbf{A}|=0\) and \((adj\mathbf{A})\mathbf{B}=\mathbf{0}\), it is possible that either there are no solutions, or infinitely many. The first equation represents the system $$\left\{\begin{matrix}x=3\\y=2\\0=0\end{matrix}\right.$$ which has infinitely many solutions, \(\{3,2,z\}\) for any *z*. The second equation represents the system $$\left\{\begin{matrix}x=3\\0=2\\0=1\end{matrix}\right.$$ which has no solutions.

But Rohit continued,

I think I get it. But I had read in Rd Sharma’s book that

and author proves it also.

Is above proof wrong?If yes, then how we can prove it theoretically?

Now we finally see the book in which an incorrect statement is made (called case (ii) this time). And there is a “proof”, so we can have more to talk about!

Doctor Rick replied:

I see that I was right, and you are indeed writing because

two sources give contradictory information. Thanks for showing us the other source, including its proof; that is a great help. Here is the part of the proof that we’re talking about (part ii):Theorem 2 (Criterion of consistency) Let AX = B be a system of n linear equations in n unknowns.

(ii) If |A| = 0 and (adj A)B = 0, then the system is consistent and has infinitely many solutions.

PROOF

(ii) We have,

AX = B, where |A| = 0.

(adj A)(AX) = (adj A)B

((adj A)A)X = (adj A)B

(|A|I

_{n})X = (adj A)B|A|X = (adj A)B

If |A| = 0 and (adj A)B = O, then |A|X = (adj A)B is

true for every value of X.So, the system of equations AX = B is consistent and it has

infinitely many solutions.

Can you find the flaw? It’s well hidden, but obvious once you see it.

Here is the reasoning I see in the proof:

If AX = B, then |A|X = (adj A)B.

If |A| = 0 and (adj A)B = O, then the

conclusionabove, |A|X = (adj A)B, is true.Therefore the

premisemust be true: AX = B, so the system has solutions.Is this good reasoning? NO! It boils down to the claim that

P implies Q

Q is true

Therefore P is true

This is the logical fallacy of affirming the consequent.

This fallacy is mentioned in our post Patterns of Logical Argument. It is also called the Fallacy of the Converse.

Note too that since the proof

assumedthat AX = B, the most that could have been claimed is that the conclusion is true for every value of Xsuch that AX = B— not for every value of X, as claimed. If the system is inconsistent, then there is no such X, and whatever has been proved is truevacuously. Thus the result is consistent with the system being inconsistent, if I may put it that way.But the fundamental error is affirming the consequent, or confusing sufficient conditions with necessary conditions. This is a good lesson to learn, but finding an error in a textbook is not the way we want to learn it!

The fact that the **pattern of reasoning** is fallacious tells us that the **proof** is wrong; the examples we looked at show that the **theorem** as stated is wrong. (These are two different things; it is possible to have a bad proof of a true statement!)

Rohit was satisfied:

Thank you sir, now I understand fully. Thanks a lot

Now Doctor Fenton rejoined the discussion, looking for a proof of the theorem in its (apparently) correct form, namely that if \((adj\mathbf{A})\mathbf{B}\ne\mathbf{0}\), there can be no solutions.

I have been thinking about this question for a few days, so let me add my thoughts.

I have only seen the adjugate matrix in the context of computing the inverse matrix, in which case both A and adj A are invertible, and the equation AX = B has a unique solution for every B in R

^{n}, and adj A is |A|A^{-1}. I have not seen any discussion of the adjugate when A is a singular square matrix.For any matrix, there is a

nullspace N(A)and acolumn space R(A)(also called the range of the matrix as a linear transformation). If A issingular, thenN(A) is non-trivial, with some positive dimension k (called thenullity of A), and the column space has a positive dimension called therank r of A. The Rank-Nullity Theorem says that r + k = n (if A is an n×n matrix), so that the column space is a proper subspace of R^{n}. Every vector (column matrix) B in the column space is the image AX of some vector X in R^{n}, B = AX. That is, theequation AX = B is consistentif and only ifB is in the column space of A. Since the column space is not all of R^{n}, AX = B will beconsistent for some B and inconsistent for others. For any B in the column space, AX = B will haveinfinitely many solutions(if X is one solution of AX = NB, then so is X + Y, when Y is any vector in the nullspace N(A)).

The nullspace of matrix **A** is the set of solution vectors **X** of \(\mathbf{A}\mathbf{X}=\mathbf{0}\); the column space of matrix **A** is the set of all values of \(\mathbf{A}\mathbf{X}\). Since we are talking about a matrix whose determinant is 0, it is singular, so its nullspace is not just \(\{\mathbf{0}\}\), and its column space does not include all vectors. So some equations \(\mathbf{A}\mathbf{X}=\mathbf{B}\) are consistent with infinitely many solutions (when **B** is in the column space), and others are inconsistent (when **B** is not). It could only have a unique solution if **A** were singular.

All of this is known without considering the adjugate of A. The only real claim being made here is that

if B is a vector such that (adj A)B ≠ 0, then AX = B is inconsistent. (We already knew that AX = B will either be inconsistent, or consistent with infinitely many solutions.) So essentially, the claim is thatif (adj A)B ≠ 0, then B must not be in the column space.

We are trying to prove that, given that \(\det(\mathbf{A})=0\), if \((adj\mathbf{A})\mathbf{B}\ne\mathbf{0}\), then the equation \(\mathbf{A}\mathbf{X}=\mathbf{B}\) is necessarily inconsistent; that is, there are no solutions. This statement is equivalent to its contrapositive: If \(\mathbf{A}\mathbf{X}=\mathbf{B}\) **has** a solution, then \((adj \mathbf{A})\mathbf{B}=\mathbf{0}\).

But we know that \((adj\mathbf{A})\mathbf{A}=\det(\mathbf{A})\mathbf{I}\). Therefore since \(\det(\mathbf{A})=0\), \((adj\mathbf{A})\mathbf{A}=\mathbf{0}\). Now, if there is a solution to \(\mathbf{A}\mathbf{X}=\mathbf{B}\), it follows that $$(adj \mathbf{A})(\mathbf{A}\mathbf{X})=((adj \mathbf{A})\mathbf{A})\mathbf{X}=\mathbf{0}\mathbf{X}=\mathbf{0}$$ Therefore, we must have \((adj \mathbf{A})\mathbf{B}=\mathbf{0}\).

This proves the claim.

To put it another way, if \((adj \mathbf{A})\mathbf{B}\ne\mathbf{0}\), then **B** is not in the column space of **A**.

But as I pointed out earlier, this question can be completely determined by using

eliminationto write the augmented matrix [A:B] and reducing the left side to ref (row echelon form) or rref (reduced row echelon form). If A is singular, then there will be at least one entire row of 0’s in the left nxn portion of the augmented matrix.If the last column has a non-0 entry in the row with all 0’s in the coefficient part of the augmented matrix, then the system is inconsistent for that B.

We saw this solution method last time. Let’s demonstrate it using a couple equations using the matrix \(A=\begin{bmatrix}1&2&3\\4&5&6\\5&7&9\end{bmatrix}\) that we saw above, whose adjugate is \(\begin{bmatrix}3&3&-3\\-6&-6&6\\3&3&-3\end{bmatrix}\):

- For the equation \(\mathbf{A}\mathbf{X}=\begin{bmatrix}6\\15\\20\end{bmatrix}\), we can row-reduce the augmented matrix like this: $$\begin{bmatrix}1&2&3&|&6\\4&5&6&|&15\\5&7&9&|&20\end{bmatrix}\rightarrow\begin{bmatrix}1&2&3&|&6\\0&1&2&|&3\\0&0&0&|&-1\end{bmatrix}$$ so the system is inconsistent.

In this case, using the adjugate calculated previously, we find that $$(adj \mathbf{A})\mathbf{B}=\begin{bmatrix}3&3&-3\\-6&-6&6\\3&3&-3\end{bmatrix}\begin{bmatrix}6\\15\\20\end{bmatrix}=\begin{bmatrix}3\\-6\\3\end{bmatrix}\ne\mathbf{0}$$ so that, according to the theorem, the system **must** be, in fact, inconsistent.

- For the equation \(\mathbf{A}\mathbf{X}=\begin{bmatrix}6\\15\\21\end{bmatrix}\), we can row-reduce the augmented matrix like this: $$\begin{bmatrix}1&2&3&|&6\\4&5&6&|&15\\5&7&9&|&21\end{bmatrix}\rightarrow\begin{bmatrix}1&2&3&|&6\\0&1&2&|&3\\0&0&0&|&0\end{bmatrix}$$ so the system has infinitely many solutions.

In this case, using the adjugate calculated previously, we find that $$(adj \mathbf{A})\mathbf{B}=\begin{bmatrix}3&3&-3\\-6&-6&6\\3&3&-3\end{bmatrix}\begin{bmatrix}6\\15\\21\end{bmatrix}=\begin{bmatrix}0\\0\\0\end{bmatrix}=\mathbf{0}$$ so that the theorem does not tell us whether the system should be inconsistent; we had to do the reduction to find out.

We have already seen an example where \((adj \mathbf{A})\mathbf{B}=\mathbf{0}\) but the system is inconsistent; here is a slightly less trivial-looking one:

For the equation $$\begin{bmatrix}1&2&3\\1&2&3\\1&2&3\end{bmatrix}\mathbf{X}=\begin{bmatrix}1\\2\\3\end{bmatrix},$$ we can row-reduce the augmented matrix like this: $$\begin{bmatrix}1&2&3&|&1\\1&2&3&|&2\\1&2&3&|&3\end{bmatrix}\rightarrow\begin{bmatrix}1&2&3&|&1\\0&0&0&|&1\\0&0&0&|&1\end{bmatrix}$$ so the system is inconsistent.

But the adjugate is \(\begin{bmatrix}0&0&0\\0&0&0\\0&0&0\end{bmatrix}\), and we find that $$(adj \mathbf{A})\mathbf{B}=\begin{bmatrix}0&0&0\\0&0&0\\0&0&0\end{bmatrix}\begin{bmatrix}1\\2\\3\end{bmatrix}=\begin{bmatrix}0\\0\\0\end{bmatrix}=\mathbf{0}$$ which, again, is not enough to tell us anything.

]]>Looking for a new topic, I realized that a recent question involves determinants, and an older one provides the background for that. We’ll continue the series on determinants by seeing how they can be used in finding the inverse of a matrix, and how something called the adjugate matrix might fit in (with side trips into Cramer’s Rule and row reduction).

This question came from Sarah, in February of last year:

I was studying matrices, and was thinking, is there some proof on

finding the inverse of a matrix?I know how to do it step by step by heart but l do not understand what I’m doing and why it is like that.

For example, the inverse uses the

determinant of a matrix– how do you interpret it? For instance, if the determinant of a 3×3 matrix is 2,what is that telling youabout the matrix?We also find

minors– if an element has a minor of -1, what does that really mean, please?

We’ve recently seen what a determinant means, algebraically and geometrically; but the “meaning” in this context is a little different. We haven’t yet looked at minors, which are determinants of sub-matrices.

Doctor Fenton answered:

Hi Sarah,

Yes, there are ways of proving that a given algorithm does produce an inverse to a matrix, and

there is more than one way to compute the inverse, one of which is to use determinants.It would help to know what you already know about matrices. Do you use matrices to solve systems of linear equations, to transform vectors (column matrices), or for some other application?

Sarah replied,

Thanks for your reply. I’m using it in a course about mathematical economics where it is mostly applied to

finding inverses to solve a system of 3 equations. If you’re familiar with some economic theory, there is also an application to find OLS estimators in a regression.We had covered matrices before, but now l want to understand a bit deeper what I’m actually doing.

So l know if I’m using determinants, l can find the reciprocal of that and

multiply by the adjoint, where the adjoint is thetranspose of the cofactor matrixbut beyond that, l still don’t know what the determinant is. I’ve always learnt it as “ad – bc”.Even

minors, l get the definition that you delete theith andjth row and column and find determinant of resultant matrix, but doing that by heart is a bit strange because l don’t understand why l am doing that, in the sense l don’t know what the minor shows you and how it leads to the inverse matrix. I think that logic is why you can only apply inverses tosquare matrices, although to solve systems of equations, number of equations = number of unknowns shouldn’t be a problem.We had previously covered

row reduction technique, l also knowLaplace expansionand the short hand rule. And we have solved systems usingCramer’s rule.Thanks

We’ll touch on most of these topics: Finding the inverse using what she calls the “adjoint“, more often today called the “adjugate“, and also by row reduction; “minors” in a determinant (used in finding the adjugate, and also in the Laplace expansion for evaluating a determinant; and Cramer’s rule for solving a system of equations.

Sometime we will look into what matrices *are*, *why* they are added and multiplied as they are, and so on. But we’ll see the basics of multiplication and inverses momentarily.

Doctor Fenton responded, first stating what an inverse is:

Thank you for clarifying what you already know. Using the

adjugate(previously called theadjoint) matrix to find the inverse isnot the most efficient way to compute the inverse. I will illustrate the ideas with 2×2 matrices, although the idea works for square matrices of any size (only square matrices can have an inverse).When I multiply two 2×2 matrices AB, with

A = [a_{11}a_{12}] and B = [b_{11}b_{12}], [a_{21}a_{22}] [b_{21}b_{22}]note that the product is

[a_{11}b_{11}+a_{12}b_{21}a_{11}b_{12}+a_{12}b_{22}] = [ [a_{11}a_{12}][b11] [a_{11}a_{12}][b_{12}] ] [a_{21}b_{11}+a_{22}b_{21}a_{21}b_{12}+a_{22}b_{22}] [ [a_{21}a_{22}][b21] [a_{21}a_{22}][b_{21}] ] = [A(B_{1}) A(B_{2})]where B

_{1}and B_{2}are the first and second columns of B. That is, to multiply A by the matrix B=[B_{1}B_{2}] on the right, you just multiply each of the columns in B by A.

To help us follow this, I’ll make a simple 2×2 example:

$$A=\begin{bmatrix}1&2\\3&4\end{bmatrix},B=\begin{bmatrix}2&-1\\1&3\end{bmatrix}\\

AB=\begin{bmatrix}1&2\\3&4\end{bmatrix}\begin{bmatrix}2&-1\\1&3\end{bmatrix}=\begin{bmatrix}1\cdot2+2\cdot1&1\cdot-1+2\cdot3\\3\cdot2+4\cdot1&3\cdot-1+4\cdot3\end{bmatrix}=\begin{bmatrix}4&5\\10&9\end{bmatrix}=Y$$

The first column of the product is A times the first column of B:

$$\begin{bmatrix}1&2\\3&4\end{bmatrix}\begin{bmatrix}2\\1\end{bmatrix}=\begin{bmatrix}1\cdot2+2\cdot1\\3\cdot2+4\cdot1\end{bmatrix}=\begin{bmatrix}4\\10\end{bmatrix}$$

That’s how we multiply. So what is the inverse?

The

inverseof a matrix A (if it exists) is the matrix A^{-1}such thatAA

^{-1}= A^{-1}A = I ,where I is the identity matrix.

If A is invertible, and we want to solve the matrix equation AX=B, where

X is a 2x1 column matrix [x_{1}] and B is a column matrix [b_{1}], [x_{2}] [b_{2}]we multiply AX=B by A

^{-1}and get X = A^{-1}B as the solution.

For our A, the inverse (which we’ll calculate below in two ways) turns out to be $$A^{-1}=\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix},$$ which we can check by seeing that $$AA^{-1}=\begin{bmatrix}1&2\\3&4\end{bmatrix}\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix}=\begin{bmatrix}1\cdot-2+2\cdot\frac{3}{2}&1\cdot1+2\cdot-\frac{1}{2}\\3\cdot-2+4\cdot\frac{3}{2}&3\cdot1+4\cdot-\frac{1}{2}\end{bmatrix}=\begin{bmatrix}1&0\\0&1\end{bmatrix}$$ and

$$A^{-1}A=\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix}\begin{bmatrix}1&2\\3&4\end{bmatrix}=\begin{bmatrix}-2\cdot1+1\cdot3&-2\cdot2+1\cdot4\\\frac{3}{2}\cdot1-\frac{1}{2}\cdot3&\frac{3}{2}\cdot2-\frac{1}{2}\cdot4\end{bmatrix}=\begin{bmatrix}1&0\\0&1\end{bmatrix}.$$

If we wanted to solve the equation \(AX=Y\), $$\begin{bmatrix}1&2\\3&4\end{bmatrix}X=\begin{bmatrix}4&5\\10&9\end{bmatrix},$$ we could multiply both sides by \(A^{-1}\) to get

$$X=A^{-1}Y=\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix}\begin{bmatrix}4&5\\10&9\end{bmatrix}=\begin{bmatrix}-2\cdot4+1\cdot10&-2\cdot5+1\cdot9\\\frac{3}{2}\cdot4-\frac{1}{2}\cdot10&\frac{3}{2}\cdot5-\frac{1}{2}\cdot9\end{bmatrix}=\begin{bmatrix}2&-1\\1&3\end{bmatrix},$$ which is our B above.

So, how do we find that inverse matrix?

To simplify notation by reducing the number of super- and subscripts, let me denote the inverse matrix of A, A

^{-1}, by C, so that C_{1}is thefirst columnof C and C_{2}thesecond.The equation AA

^{-1}= AC = I can be written asAC = A[C

_{1}: C_{2}] = (AC_{1}: AC_{2}] = [E_{1}: E_{2}] ,since

E1 = [1] is the first column of I and E2 = [0] is the second. [0] [1]Then AC

_{1}=E_{1}and AC_{2}=E_{2}, which says that C_{1}is the solution to AX=E_{1}, and C_{2}is the solution to AX=E_{2}.

In our example, we find the two columns of the inverse by solving $$AC_1=E_1$$ $$\begin{bmatrix}1&2\\3&4\end{bmatrix}C_1=\begin{bmatrix}1\\0\end{bmatrix}$$ and $$AC_2=E_2$$ $$\begin{bmatrix}1&2\\3&4\end{bmatrix}C_2=\begin{bmatrix}0\\1\end{bmatrix}$$

But you know how to solve AX=B by

row reducing the augmented matrix[A:B] (the matrix A augmented with B as an extra column) to the form [I:X], so that the solution X is the last column of the reduced augmented matrix.Then, to find the

inverse matrix, we augment the matrix A with the identity matrix [A:I] (a 2×4 matrix) and row reduce to the form [I:C], and the inverse matrix will be the right half of the reduced 2×2 matrix. (If the left half cannot be reduced to I, then the matrix A is not invertible.) That is the efficient way to find A^{-1}.

This is the standard method that he referred to before, and which we’ll see below. But we can also use determinants to solve this equation, which will lead to the adjugate. For that, keep reading …

Finding \(C_1\) and \(C_2\) each amounts to solving a system of equations, which we can do with determinants:

If you solve

ax + by = u cx + dy = vwith elimination, multiplying the first equation by d and the second equation by b, and then subtracting, you get

(ad – bc)x = du – bv,

so

x = (du – bv)/(ad – bc), or

or

[u b] det [v d] x = --------- , [a b] det [c d]and similarly y = (av – cu)/(ad – bc) is a quotient of determinants. This indicates

where determinants can come fromand can lead toCramer’s Rule, but using determinants is not the best way to find the inverse.

Here we have derived Cramer’s Rule by brute force in the 2×2 case. As Wikipedia puts it,

Consider a system of *n* linear equations for n unknowns, represented in matrix multiplication form as follows: $$A\mathbf{x}=\mathbf{b}$$

where the *n* × *n* matrix A has a nonzero determinant, and the vector \(\mathbf{x}=(x_1,\dots,x_n)^T\) is the column vector of the variables. Then the theorem states that in this case the system has a unique solution, whose individual values for the unknowns are given by: $$x_i=\frac{\det(A_i)}{\det(A)}\; \; \; i=1,\dots n$$ where \(A_i\) is the matrix formed by replacing the i-th column of A by the column vector \(\mathbf{b}\).

So let’s solve our system this way, in order to find the inverse of A:

To find the first column of our inverse, we need to solve

$$\begin{bmatrix}1&2\\3&4\end{bmatrix}C_1=\begin{bmatrix}{\color{Green}1}\\{\color{Green}0}\end{bmatrix}$$

Cramer’s rule gives this solution:

$$C_{11}=\frac{\begin{vmatrix}{\color{Green}1}&2\\{\color{Green}0}&{\color{Red}4}\end{vmatrix}}{\begin{vmatrix}1&2\\3&4\end{vmatrix}}=\frac{1\cdot{\color{Red}4}-2\cdot0}{1\cdot4-2\cdot3}=\frac{{\color{Red}4}}{-2}=-2$$

$$C_{21}=\frac{\begin{vmatrix}1&{\color{Green}1}\\ {\color{Red}3}&{\color{Green}0}\end{vmatrix}}{\begin{vmatrix}1&2\\3&4\end{vmatrix}}=\frac{1\cdot0-1\cdot{\color{Red}3}}{1\cdot4-2\cdot3}=\frac{-{\color{Red}3}}{-2}=\frac{3}{2}$$

But observe that the determinant on the top, in each case, is just the element (4 or 3) opposite the 1, with an alternating sign; I’ve highlighted them. These, as we’ll see, are **cofactors**.

So the first column is $$C_{1}=\begin{bmatrix}-2\\\frac{3}{2}\end{bmatrix}$$

Similarly, to solve

$$\begin{bmatrix}1&2\\3&4\end{bmatrix}C_2=\begin{bmatrix}{\color{Green}0}\\{\color{Green}1}\end{bmatrix}$$

we use

$$C_{12}=\frac{\begin{vmatrix}{\color{Green}0}&{\color{Red}2}\\{\color{Green}1}&4\end{vmatrix}}{\begin{vmatrix}1&2\\3&4\end{vmatrix}}=\frac{0\cdot4-{\color{Red}2}\cdot1}{1\cdot4-2\cdot3}=\frac{-{\color{Red}2}}{-2}=1$$

$$C_{22}=\frac{\begin{vmatrix}{\color{Red}1}&{\color{Green}0}\\3&{\color{Green}1}\end{vmatrix}}{\begin{vmatrix}1&2\\3&4\end{vmatrix}}=\frac{{\color{Red}1}\cdot1-0\cdot3}{1\cdot4-2\cdot3}=\frac{{\color{Red}1}}{-2}=-\frac{1}{2}$$

So the second column of the inverse is $$C_{2}=\begin{bmatrix}1\\-\frac{1}{2}\end{bmatrix}$$

This gives us the inverse I showed before,

$$A^{-1}=\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix}$$

We almost used the adjugate here, though we haven’t yet even talked about what it is. We’ll get there eventually, but first, he answered the side questions:

Determinants have a

geometric interpretation. The determinant of[a b] [c d]is the area of the parallelogram with sides given by the vectors (a,b) and (c,d) in the plane.

I don’t know of any significance of this fact for solving linear systems, other than the fact that if the determinant is 0, then the system either has no solution or infinitely many solutions, depending upon the right side B.Does this help?

This is the subject of our last two posts.

Sarah asked for a little more:

Thank you so much for that, Dr Fenton.

Just to make sure l understood, could you kindly

illustrate through an example? I can then apply that myself to a 3×3, don’t worryWhy is there such an emphasis on determinants

not being the most efficient way, please?The part on deriving the determinant and how it can lead to Cramer’s Rule is very interesting, thank you.

What about the part on

minors, particularly interpreting them – the idea behind WHY we delete the i^{th}row and j^{th}column and take the determinant of the resultant matrix.Thank you!

Doctor Fenton replied with, first, a statement of what we did above with Cramer’s Rule:

By an example, I assume that you want an example of

using row reduction to compute an inverse of a matrix. In the 2×2 case, the determinant approach gives the inverse matrix of[a b]^{-1}[ d -b]^{ }[c d] = 1/(ad-bc) [-c a]which doesn’t require much computation.

That matrix is, in fact, the adjugate.

Then he gave an example of the more efficient method of finding inverses, before getting back to minors:

For a 3×3 example, to find

[ 1 -1 0]^{-1}[ 1 0 -1] [-6 2 3] ,we write

[ 1 -1 0 1 0 0] [ 1 0 -1 0 1 0] [-6 2 3 0 0 1]and row reduce to

[ 1 0 0 -2 -3 -1] [ 0 1 -1 -3 -3 -1] [ 0 0 1 -2 -4 -1] ,so

[ 1 -1 0]^{-1 }[-2 -3 -1] [ 1 0 -1] = [-3 -3 -1] [-6 2 3] [-2 -4 -1] .

We’ll see the adjugate method, for the same matrix, later.

The reason for preferring row operations is because of complexity. Even in the 3×3 case, the arithmetic work required is not onerous, but

for larger matrices, there is a big difference. It’s not hard to see that in general,computing an nxn determinantrequires computing n! terms, while row-reducing an nxn matrix to upper triangular form takes roughly n^{3}/6 operations, so reducing the left half of the augmented n x (2n) matrix to the identity will take about n^{3}/3 operations. For n=2 or 3, n! and n^{3}/3 are comparable, but for larger n, say n=10, 10! is over 3×10^{6}, while 10^{3}/3 is about 300. For n=100, the value of 100! is an integer with 158 digits, while 100^{3}/3 is in the hundreds of thousands.To compute the value of

large determinants, it is more efficient to use row operations to transform the matrix to upper triangular form, since the determinant of a triangular matrix is just the product of its diagonal elements, and the effects of two operations on a determinant is easy to determine: interchanging rows changes the sign of the determinant; multiplying a row by a constant multiplies the determinant by the same constant; and replacing a row by the sum of itself and another row doesn’t change the determinant.

This provides a way to find determinants that is quicker than doing it directly; but in the adjugate method we’re about to see, we’d need to calculate *many* determinants!

The adjugate is defined in terms of **minors**, which arise in the **Laplace expansion of a determinant**; so he explained that first. Here is what it looks like for a 3×3 determinant, starting with the algebraic definition we saw two weeks ago:

As for the Laplace expansion, I don’t know how Laplace discovered it, but if you look at the 3×3 case,

[a b c] det [d e f] = aei + cdh + bfg - ceg - afh - bdi = a(ei-hf) + b(fg-di) + c(dh-eg) [g h i] = a det[e f] - b det [d f] + c det [d e] [h i] [g i] [g h] .You can pick any row (or column) and rewrite the determinant as a

sum of the entries in that row(or column) timesdeterminantswhich are theminorsof the entries.

Each element of one row (here, the top) is multiplied by the determinant of the matrix formed by removing that element’s row and column. The **minor** of the bold entry here is the determinant of the part in red, and the **cofactor** is the minor multiplied by \(\pm1\):

\begin{vmatrix}\mathbf{a}&b&c\\d&{\color{Red}e}&{\color{Red}f}\\g&{\color{Red}h}&{\color{Red}i}\end{vmatrix}

\begin{vmatrix}a&\mathbf{b}&c\\ {\color{Red}d}&e&{\color{Red}f}\\ {\color{Red} g}&h&{\color{Red}i}\end{vmatrix}

\begin{vmatrix}a&b&\mathbf{c}\\ {\color{Red}d}&{\color{Red}e}&f\\ {\color{Red} g}&{\color{Red}h}&i\end{vmatrix}

The same pattern is true, almost trivially, of the 2×2 determinant: the minors are just the diagonally opposite entries, as I mentioned above.

Sarah now asked for the one missing piece:

Thank you Dr Fenton! This is why l love asking questions here – l always learn more than l ever thought l would before asking!

The part about number of operations isn’t as obvious to me, but l do get the gist why row operations are quicker.

Could you elaborate on the notion of

minors, please? I’m still unsure what a minor of 4 would really be saying. I think there’s more to it that l just don’t know about.And what about proving that

1/det multiplied by adjugateindeed gives you theinverse matrix, please?Thank you

Doctor Fenton answered:

As I think I said earlier, I just regard minors as

quantities which arise in evaluating determinants. As a determinant, it has a geometric interpretation as an area or volume in 2 or 3 dimensions, but I am not aware of any geometric significance to that fact. The Laplace expansion (or cofactor expansion) tells you that the absolute value of a 3×3 determinant is a volume of a 3-dimensional parallelepiped, which is a linear combination of some 2-dimensional areas (the areas corresponding to the minors of the determinant), but I don’t know that this interpretation helps understand what a determinant is.

This could be interesting to think more about, but if there is a meaning, it is not obvious.

Now we finally get to the adjugate:

As for the inverse formula of an invertible matrix A, you form the

cofactor matrixC of A, where the entry in the i^{th}row and j^{th}column is c_{ij}, the cofactor of the entry a_{ij}in A (that is, (-1)^{i+j}M_{ij}), obtained by deleting the i^{th}row and j^{th}column of A. Next, youtranspose the cofactor matrix, C^{T}. This is theadjugate matrix.Then the matrix product AC

^{T}is[a_{11}a_{12}... a_{1n}][c_{11}c_{21}... c_{n1}] [a_{21}a_{22}... a_{2n}][c_{12}c_{22}... c_{2n}] [ : : :][ : : : ] [a_{n1}a_{n2}... a_{nn}][c_{1n}c_{2n}... c_{nn}] ,so the 11 entry of the product is

a

_{11}c_{11}+a_{12}c_{12}+ … + a_{1n}c_{1n}which is exactly the cofactor expansion of det(A). The 12 entry of the product is

a

_{11}c_{21}+a_{12}c_{22}+ … + a_{1n}c_{2n},which is the cofactor expansion of the determinant of the matrix

[a_{11}a_{12}... a_{1n}] [a_{11}a_{12}... a_{1n}] [ : : : ] [a_{n1}a_{n2}... a_{nn}] .This matrix has a repeated row, so the determinant of this matrix is 0.

Then the product AC

^{T}is[det(A) 0 0 ... 0 ] [ 0 det(A) 0 ... 0 ] [ 0 0 det(A) ... 0 ] [ : : : ... : ] [ 0 0 0 ... det(A)] ,which is det(A)I, where I is the nxn identity matrix.

We’ve already done this in our 2×2 example. With $$A=\begin{bmatrix}1&2\\3&4\end{bmatrix},$$ the cofactor matrix is $$C=\begin{bmatrix}4&-3\\-2&1\end{bmatrix},$$ swapping diagonally opposite entries and changing the sign of every other one. Its transpose is $$C^T=\begin{bmatrix}4&-2\\-3&1\end{bmatrix},$$ which is the adjugate. Dividing this by the determinant, \(1\cdot4-2\cdot3=-2,\) we get $$A^{-1}=\begin{bmatrix}\frac{4}{-2}&\frac{-2}{-2}\\\frac{-3}{-2}&\frac{1}{-2}\end{bmatrix}=\begin{bmatrix}-2&1\\\frac{3}{2}&-\frac{1}{2}\end{bmatrix}.$$ This is what we got before.

Can you see the connection between this and what we did with Cramer’s Rule?

Now let’s do a 3×3 example; using the example Doctor Fenton used above, I’ll take $$A=\begin{bmatrix}1&-1&0\\1&0&-1\\-6&2&3\end{bmatrix}.$$

The cofactor of the first entry, \(a_{11}\), is $$(-1)^{1+1}\begin{vmatrix}0&-1\\2&3\end{vmatrix}=2,$$ so that is the first entry. The cofactor of \(a_{12}\), is $$(-1)^{1+2}\begin{vmatrix}1&-1\\-6&3\end{vmatrix}=-(-3)=3,$$Continuing, the cofactor matrix is $$C=\begin{bmatrix}2&3&2\\3&3&4\\1&1&1\end{bmatrix},$$ and the adjugate is $$C^T=\begin{bmatrix}2&3&1\\3&3&1\\2&4&1\end{bmatrix}.$$

Its determinant is (using cofactors in the first row) $$\det(A)=\begin{vmatrix}1&-1&0\\1&0&-1\\-6&2&3\end{vmatrix}=1\cdot2+-1\cdot3+0\cdot2=2-3+0=-1.$$

So the inverse is $$A^{-1}=\frac{C^T}{\det(A)}=\frac{1}{-1}\begin{bmatrix}2&3&1\\3&3&1\\2&4&1\end{bmatrix}=\begin{bmatrix}-2&-3&-1\\-3&-3&-1\\-2&-4&-1\end{bmatrix},$$ as we got by row reduction.We can check this by multiplying:

$$AA^{-1}=\begin{bmatrix}1&-1&0\\1&0&-1\\-6&2&3\end{bmatrix}\begin{bmatrix}-2&-3&-1\\-3&-3&-1\\-2&-4&-1\end{bmatrix}=\\\begin{bmatrix}1\cdot-2+-1\cdot-3+0\cdot-2&1\cdot-3+-1\cdot-3+0\cdot-4&1\cdot-1+-1\cdot-1+0\cdot-1\\1\cdot-2+0\cdot-3+-1\cdot-2&1\cdot-3+0\cdot-3+-1\cdot-4&1\cdot-1+0\cdot-1+-1\cdot-1\\-6\cdot-2+2\cdot-3+3\cdot-2&-6\cdot-3+2\cdot-3+3\cdot-4&-6\cdot-1+2\cdot-1+3\cdot-1\end{bmatrix}=\\\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}$$

]]>A recent question asked for the connection between two different ways to use determinants geometrically: to find the **area** of a triangle, and to find the **volume** of a pyramid (or the area of a parallelogram and the volume of a parallelepiped). Last time we looked at **what a determinant is**, using some older questions; here we’ll first look at another old question that dealt specifically with volume, and then, in the recent question, we’ll see **how the volume and area formulas are related**. This will provide a second proof of the volume formula.

This question, from 2002, is not directly about the volume of a *parallelepiped*, but comes very close by discussing a *tetrahedron* (triangular pyramid). I’ll point out the relationship as we work through this.

Volume of a Tetrahedron The volume of a tetrahedron is one-third the distance from a vertex to the opposite face, times the area of that face.Find a formula for the volume of a tetrahedronin terms of the coordinates of its vertices P, Q, R, and S. I'm not even sure where to begin. I think it may have something to do with cross product multiplication of vectors.

The assignment is to create a volume formula similar to the area formula (using a determinant) that we have seen for a triangle, given its vertices. From our perspective, we will be relating the algebraic definition of a \(3\times3\) determinant, from last time, to a volume.

Doctor Pete answered:

Thanks for writing to Dr Math. Don't despair - you do know where to begin, because you mentionedvectors. So I'll begin by setting up some vectors, which will be denoted by capital letters, in terms of their coordinates, which will be lowercase letters. Suppose you have vertices P = (x1, y1, z1), Q = (x2, y2, z2), R = (x3, y3, z3), S = (x4, y4, z4). Furthermore, we will write A = Q - P, B = R - P, C = S - P. In essence, we translated vector P to the origin, and moved Q, R, S accordingly, to obtain A, B, and C; this will simplify our work.

Our vectors A, B, and C represent the sides of the tetrahedron (or of a parallelepiped) that meet at vertex P.

As an example, taking vertex P at the origin, and using vectors $$A=\left<2,-4,3\right>,B=\left<3,1,2\right>,C=\left<-3,2,1\right>,$$ we get this pyramid:

and this parallelepiped:

We’ll be using the scalar (dot) product and the vector (cross) product, which we’ve explored in past posts.

Now recall thedot productof two vectors M = (m1, m2, m3), N = (n1, n2, n3) satisfy the following properties: [D1] M . N = m1*n1 + m2*n2 + m3*n3, [D2] M . N = |M||N|Cos[t]. Here |M| signifies the magnitude (length) of M, and t is the angle between vectors M and N. As for thecross product, we have | i j k | [C1] M x N = | m1 m2 m3 |, | n1 n2 n3 | [C2] M x N = |M||N|Sin[t]*U, where in [C1], i, j, k are the unit x-, y-, and z-vectors, and in [C2], U is the unit vector that is orthogonal to M and N and points in the direction as specified by the right-hand rule. (In particular, we have i x j = k, j x k = i, k x i = j.) A proof of these facts is given in all textbooks dealing with linear algebra.

He has shown, for each product, both an **algebraic** meaning (in terms of components) and a **geometric** meaning (in terms of lengths and areas), as explained in the links I provided above.

The second property about cross products is the main connection to the geometry of the problem, because geometrically it says that thecross productof two vectors is a third vectororthogonalto the other two, with magnitude equivalent to thearea of the parallelogramdefined by the two vectors. Perhaps a picture will show this: N_______________ /| / / |h / / |_ / /___|_|________/ 0 M In the above diagram we are looking at the plane containing the vectors M and N. The height h of the parallelogram is simply |N|Sin[t], where t is the angle M0N, the angle between M and N. Thus thearea of the parallelogramis |M||N|Sin[t]. The vector M x N is pointing in a direction perpendicular to this plane (straight at you), by the right-hand rule.

This area will be the base of our parallelepiped.

The curious thing about the cross product, then, is that thearea of the triangledetermined by points M, N, and 0, is simplyhalf the magnitude of the cross product, because the parallelogram consists of two congruent copies of triangle M0N. Thus, in the case of our vectors A, B, C, we may choose any two of these to show that the area of the triangular face determined by, say, vectors B and C, is simply |B x C|/2.

The area of the parallelogram is just \(\text{Area }=|B\times C|\).

In our example, the area of the parallelogram formed by vectors B and C is the length of $$B\times C=\begin{vmatrix}\mathbf{i}&\mathbf{j}&\mathbf{k}\\3&1&2\\-3&2&1\end{vmatrix}=\left<-3,-9,9\right>,$$ namely $$|B\times C|=\sqrt{(-3)^3+(-9)^2+(9)^2}=\sqrt{171}$$

But wait - there's more. We observe that the vector B x C isparallel to the height from the vertex at Ato the opposite face. If we draw another picture in the plane that contains the vectors B x C, A, and the length from A to the plane containing B and C, as follows, B x C | | A | /| | /s| | / |d |s/ | |/____|_____B___C 0 G we see that B and C are now projected onto this plane and appear as a single line. The important thing to realize is that in this picture, vector A is in the same plane as B x C and the line segment AG. If we let s = angle 0AG, then we may write the distance d of AG as simply d = |A|Cos[s].

Here we see this angle between our \(A\) and \(B\times C\):

I can turn the image to show B and C approximately horizontal, the cross product vertical, and A at an angle:

Since $$|A|=\sqrt{(2)^2+(-4)^2+(3)^2}=\sqrt{29},$$ and $$|B\times C|=\sqrt{(-3)^2+(-9)^2+(9)^2}=\sqrt{171},$$ and $$A\cdot(B\times C)=\left<2,-4,3\right>\cdot\left<-3,-9,9\right>=(2)(-3)+(-4)(-9)+(3)(9)=57,$$ the indicated angle is $$\arccos(\frac{57}{\sqrt{29}\sqrt{171}}=\arccos(0.8094)=35.96^\circ$$

But we don’t need to calculate the angle; we already have the volume:

Therefore, thevolume of the tetrahedronis V = (1/3)d|B x C|/2 = (1/6)|A||B x C|Cos[s]. But s is also the angle between A and B x C, and if we recall the formula [D2] for the dot product of two vectors, we find that V = (1/6)|A . (B x C)|. The product A . (B x C) is more commonly called the (scalar) triple product, because (with some slight details omitted) the symmetry of our argument reveals that A . (B x C) = B . (C x A) = C . (A x B).

The volume of the parallelepiped is the same thing without the \(\frac{1}{6}\): \(\text{Volume }=A\cdot(B\times C)\).

Now we may write A . (B x C) in terms of the coordinates using the formulas [D1, C1]. We have | i j k | B x C = |x3-x1 y3-y1 z3-z1|, |x4-x1 y4-y1 z4-z1| and since A = (x2-x1, y2-y1, z2-z1), we immediately see that |x2-x1, y2-y1, z2-z1| A . (B x C) = |x3-x1, y3-y1, z3-z1|. |x4-x1, y4-y1, z4-z1|

In terms of the vectors themselves, the volume of the parallelepiped determined by vectors \(A=\left<x_1,y_1,z_1\right>\), \(B=\left<x_2,y_2,z_2\right>\), and \(C=\left<x_3,y_3,z_3\right>\), would be $$\displaystyle\begin{vmatrix}x_1&y_1&z_1\\x_2&y_2&z_2\\x_3&y_3&z_3\end{vmatrix}$$

For our example, the volume is $$\begin{vmatrix}2&-4&3\\3&1&2\\-3&2&1\end{vmatrix}=\\(2)(1)(1)-(2)(2)(2)+(-4)(2)(-3)-(-4)(3)(1)+(3)(3)(2)-(3)(1)(-3)=\\2-8+24+12+18+9=57$$

This is just what we already calculated as that dot product!

Thus we have a formula for V in terms of the coordinates of P, Q, R, and S. But shouldn't this formula be symmetric in the coordinates? It is - it's just that it isn't obvious from looking at it. I leave it to you to show that the above determinant is equivalent to |x1 y1 z1 1| |x2 y2 z2 1| |x3 y3 z3 1| |x4 y4 z4 1| .

This is the usual form we use when we are given the coordinates of the vertices, rather than the components of the vectors. You can prove it is equal to the other by expanding each determinant as a sum of products, and seeing that they match.

Now let’s look at the recent question, which compares two of the determinant formulas:

The new question came from Rohit in early March:

I read somewhere that the determinant of a matrix of 3 × 3 represents the

volume of the parallelepiped.But when we try to find the

area of a triangleusing the determinant, then there is also a matrix of 3 × 3 order.So how is this formula telling the area, it should have told the volume?Please clear my doubt.

This formula is one we discussed in Polygon Coordinates and Areas.

Doctor Rick answered:

Hi, Rohit.

Yes, the

volume of a parallelepipedcan be found using a determinant, and thearea of a trianglecan be found using a determinant. Obviously, the difference is inwhat determinantwe use.The volume of a

parallelepipedwith edges given by thevectors<x_{1}, y_{1}, z_{1}>, <x_{2}, y_{2}, z_{2}>, and <x_{3}, y_{3}, z_{3}> — in other words, with a vertex at (0, 0, 0) and the three vertices adjacent to it at (x_{1}, y_{1}, z_{1}), (x_{2}, y_{2}, z_{2}), and (x_{3}, y_{3}, z_{3}) — is the absolute value of the determinant:| x_{1 }y_{1 }z_{1}| V = | x_{2 }y_{2 }z_{2}| | x_{3 }y_{3 }z_{3}|

This formula, which we saw above, uses **three vectors**, with **three coordinates each**, to describe a parallelepiped with one vertex at the origin.

Again, this volume, for our example, is $$\begin{vmatrix}2&-4&3\\3&1&2\\-3&2&1\end{vmatrix}=57$$

The

triangleyou’re talking about hasverticesat (x_{1}, y_{1}), (x_{2}, y_{2}), and (x_{3}, y_{3}), in two dimensions. The determinant we use to find its area is| x_{1 }y_{1 }1 | A = 1/2 | x_{2 }y_{2 }1 | | x_{3 }y_{3 }1 |

This uses **three vertices**, with **two coordinates each**.

For example, for this triangle,

this formula gives $$\frac{1}{2}\begin{vmatrix}2&-4&1\\3&1&1\\-2&2&1\end{vmatrix}=\\\frac{(2)(1)(1)-(2)(1)(2)+(-4)(1)(-2)-(-4)(3)(1)+(1)(3)(2)-(1)(1)(-2)}{2}=\\\frac{2-4+8+12+6+2}{2}=\frac{26}{2}=13$$ (To check, it turns out to be a right triangle with legs \(\sqrt{26}\), whose area is … 13.)

So we have two different formulas, both using determinants, which give a volume and an area respectively.

It just isn’t the same thing; it isn’t at all surprising that it calculates a different quantity. In fact, the determinant here

cannotgive a volume. If the coordinates have the dimension of length (say, each is in meters), then the determinant (with terms of the form x_{i}y_{j}) will have units ofmeters squared, notmeters cubed.

Recall from last week that a \(3\times3\) determinant is a sum of terms, each of which is a product of a number from each row and each column, so each term in this case is \(x_i\cdot y_j\cdot1\), which is in square units.

Checking dimensions can help distinguish formulas.

Now comes the fun part:

But can we find a connection between the two formulas? Yes, we can! It isn’t easy — I have never thought about it this way before — but it can be done.

Let’s

put the triangle in the plane z = 1of three-dimensional space, with the vertices at (x_{1}, y_{1}, 1), (x_{2}, y_{2}, 1), and (x_{3}, y_{3}, 1). (Here, 1, being a z coordinate, is a length, like the x’s and y’s.) Now regard those three vertices as thethree vertices of a parallelepiped with fourth vertex (0, 0, 0). The volume of this parallelepiped is given by the formula above. Here is a figure.

The triangle we are starting with, ABC, is in **red**; the **black** lines are the edges of the parallelepiped, which is determined by the three vectors \(OA=\left<x_1,y_1,1\right>\), \(OB=\left<x_2,y_2,1\right>\), and \(OC=\left<x_3,y_3,1\right>\), so the volume formula gives $$\begin{vmatrix}x_1&y_1&1\\x_2&y_2&1\\x_3&y_3&1\end{vmatrix}$$

The two horizontal triangles, red and green, are congruent; the area of each is the area A calculated above.

They dissect the parallelepiped into

two pyramids, each with volume 1/3 Ah, where the height h is 1, anda prismatoidwhose volume is given byV

_{p}= 1/6 (A_{1}+ 4M + A_{2})where A

_{1}= A_{2}= A, the area of the triangle, and M is the area of the hexagon shown in yellow, midway between the two triangular sections.

The pyramids at the top and bottom, with red and green triangular bases, each have area \(\frac{1}{3}A\). Each side of the hexagon is the midline of a triangle, and so is half the length of a side of the triangle; for example, side JK is a midline of triangle ABF, so its length is half that of segment AB.

Here we see the red and green triangles (relabeled as \(ABC\) and \(A’B’C’\)), and you can see that the solid red triangle \(A”B”O\) is similar to \(ABC\), with sides half as long. Repeating this observation, the hexagon can be divided into six triangles, each of which is \(\frac{1}{4}\) the area of triangle \(ABC\):

Therefore, \(M=6\left(\frac{1}{4}A\right)=\frac{3}{2}A\). We can put this into the formula for the volume of the prismatoid, and add on the two pyramids, to find the total volume V:

It can be shown that M = 3/2 A, so that the total area of the parallelepiped is

V = 1/3 A + 1/6 (A + 4(3/2 A) + A) + 1/3 A = 2A

So

| x_{1 }y_{1 }1 | V = | x_{2 }y_{2 }1 | | x_{3 }y_{3 }1 |This is just the determinant in the formula above for the volume of the parallelepiped with vertices at (0, 0, 0), (x

_{1}, y_{1}, 1), (x_{2}, y_{2}, 1), and (x_{3}, y_{3}, 1).

If you missed it, he has shown that the volume of the parallelepiped is 2 units of length times the area of the triangle (in square units), so we just have to multiply the formula for the area by 2, which cancels out the \(\frac{1}{2}\):

$$A=\frac{1}{2}\begin{vmatrix}x_1&y_1&1\\x_2&y_2&1\\x_3&y_3&1\end{vmatrix}$$

$$V=\frac{1}{3}A+\frac{1}{6}\left(A+4\left(\frac{3}{2}A\right)+A\right)+\frac{1}{3}A=2A=\begin{vmatrix}x_1&y_1&1\\x_2&y_2&1\\x_3&y_3&1\end{vmatrix}$$

But this is the same result we get when we apply

$$V=\displaystyle\begin{vmatrix}x_1&y_1&z_1\\x_2&y_2&z_2\\x_3&y_3&z_3\end{vmatrix}$$

to our three vectors \(\left<x_1,y_1,1\right>\), \(\left<x_2,y_2,1\right>\),\(\left<x_3,y_3,1\right>\). So we’ve derived the matrix formula for this particular parallelepiped from the formula for the area of a triangle.

There are much better ways to prove the parallelepiped volume formula than this! However, it does show that the two formulas you asked about are

consistent.

So, if we didn’t have a proof for the volume determinant, we could obtain it by way of the area determinant.

Of course, this is a special parallelepiped, with *z*-components of the three vertices all equal to 1, so it doesn’t directly prove the formula for any three vectors. With some work, we could generalize it. But the goal here was primarily to show the compatibility of the two determinant formulas.

First, a question from 1997:

Determinant of a Matrix My Algebra 2 teacher told us that for extra credit we could give hima complete and unabridged version of the definition of a DETERMINANT of a matrix. He said that we could only find it in a library or an advanced collegiate math book. He also said that if we could understand the definition, it wasn't the right one. I have tried numerous dictionaries and other references over the Internet. Please help me if you can! Thank you so much.

Can we give a definition that can’t be understood by a student at this introductory level, and also make it understandable? We’ll see.

Doctor Tom answered:

Hi Mark, I'll give youtwo definitionsthat are exactly equivalent, but sound very different. The first isgeometric. I assume you've plotted things in an x-y coordinate system, right? I assume you can imagine doing the same thing in three dimensions with an x-y-z coordinate system as well. In2-D, when you talk about the point (2, 4), you can think of the "2" and "4" as directions to get from the origin to the point - "move 2 units in the x direction and 4 in the y direction." In a3-Dsystem, the same idea holds - (1, 3, 7) means start at the origin (0,0,0), go 1 unit in the x direction, 3 in the y direction, and 7 in the z direction. Similarly, you could have coordinates inone dimension, but there's just one number.

We’ll be looking at one-, two-, and three-dimensional “parallelotopes“, which are line segments, parallelograms, and parallelepipeds, respectively.

Thedeterminant of a 1x1 matrixis thesigned lengthof the line from the origin to the point. It's positive if the point is in the positive x direction, negative if in the other direction.

Here is a one-dimensional parallelotope, a **segment**, defined by a single point, or vector (in this case, a negative number), which can be represented as a \(1\times1\) matrix \(\begin{bmatrix}x_1\end{bmatrix}\):

The determinant \(\det\left(\begin{bmatrix}-3\end{bmatrix}\right)=\begin{vmatrix}-3\end{vmatrix}\) (not to be confused with an absolute value!) is \(-3\), the signed length of the segment.

In 2-D, look at the matrix astwo 2-dimensional pointson the plane, and complete theparallelogramthat includes those two points and the origin. The(signed) areaof this parallelogram is the determinant. If you sweep clockwise from the first to the second, the determinant is negative; otherwise, positive.

Here is a two-dimensional parallelotope, a **parallelogram**, defined by two vectors (pairs of numbers), which can be represented by a \(2\times2\) matrix, \(\displaystyle\begin{bmatrix}x_1&y_1\\x_2&y_2\end{bmatrix}\):

The determinant \(\displaystyle\begin{vmatrix}1&3\\-3&2\end{vmatrix}\) is \(11\), its area. If we reverse the order, to \(\displaystyle\begin{vmatrix}-3&2\\1&3\end{vmatrix}\), we get \(-11\), because we would be going around the figure clockwise rather than counterclockwise as shown.

In 3-D, look at the matrix as 3 3-dimensional points in space. Complete theparallelepipedthat includes these points and the origin, and the determinant is the(signed) volumeof the parallelepiped.

Here is a three-dimensional parallelotope, a **parallelepiped**, defined by three vectors (triples of numbers), which can be represented by a \(3\times3\) matrix, \(\displaystyle\begin{bmatrix}x_1&y_1&z_1\\x_2&y_2&z_2\\x_3&y_3&z_3\end{bmatrix}\):

The determinant \(\displaystyle\begin{vmatrix}-3&0&2\\1&-1&2\\4&1&2\end{vmatrix}\) is \(22\), its signed volume. (Just trust me on that number.)

The same idea works in any number of dimensions. The determinant is just the (signed) volume of the n-dimensional parallelepiped. Notice that length, area, volume are the "volumes" in 1-, 2-, and 3-dimensional spaces. A similar concept of volume exists for Euclidean space of any dimensionality.

He’s only defined the determinant as the volume, without showing how to actually calculate it. The second definition will allow us to do so, and next week we’ll see why.

Doctor Tom continued:

Okay. That's the geometric definition. I like it because I can make a mental picture of it. Here's thealgebraic definition: I'll do it in 3 dimensions, but exactly the same idea works in any number of dimensions. Let's look at the determinant of this matrix: | a11 a12 a13 | | a21 a22 a23 | | a31 a32 a33 | The numbers after the "a" are the row and column numbers.

These are normally written as subscripts: $$\begin{vmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{vmatrix}$$

Apermutationof a set of numbers is a re-arrangement. For example, there are 6 permutations of the list (1 2 3), including the "re-arrangement" that leaves everything unchanged). Ignore for the moment the "+1" and "-1" after each one: (1 2 3) -> (1 2 3) +1 (1 2 3) -> (1 3 2) -1 (1 2 3) -> (2 1 3) -1 (1 2 3) -> (2 3 1) +1 (1 2 3) -> (3 1 2) +1 (1 2 3) -> (3 2 1) -1 Now imagine that you start with three objects labelled 1, 2, and 3 arranged as they are on the left, and need to convert them to the order on the right, but you're only allowed to swap one pair at a time. To get to the final arrangement, you'll find that there are lots of ways to do it, but every way (for a particular rearrangement) always requires an even number of swaps or always requires an odd number of swaps. I've labelled those that always need anevennumber of swaps with+1and those needing anoddnumber as-1above.

For example, to change (1 2 3) to (3 1 2), you might swap 2 and 3 to get (1 3 2), then swap 1 and 3 to get (3 1 2). This is an even number of swaps, so it is an **even** permutation, and is labeled with \(+1\). To get from there to (3 2 1) would require a third swap, making it **odd**.

Now write down 6 products of the "a" terms, where the first number for each term is 1, 2, 3 andthe second number is the rearrangementabove for each of the six rearrangements. Here's what they are, in the same order as above. Be sure you understand this step: a11*a22*a33 a11*a23*a32 a12*a21*a33 a12*a23*a31 a13*a21*a32 a13*a22*a31

The next to last row, for example, represents our (3 1 2), because the first indices are 1, 2, 3, while the second indices are 3, 1, 2.

Thedeterminantis just thesum of all 6 terms, but put a "+" in front if the rearrangement is even, and a "-" in front if the rearrangement required an odd number of swaps. Here's the answer: +a11*a22*a33 -a11*a23*a32 -a12*a21*a33 +a12*a23*a31 +a13*a21*a32 -a13*a22*a31

You could say that we are **adding** all the products of **even** permutations, and **subtracting** the products of **odd** permutations.

Here is how we evaluate the determinant for volume in the example above:

$$\displaystyle\begin{vmatrix}-3&0&2\\1&-1&2\\4&1&2\end{vmatrix}=$$ $$(-3)(-1)(2)-(-3)(2)(1)\\-(0)(1)(2)+(0)(2)(4)\\+(2)(1)(1)-(2)(-1)(4)=$$ $$6+6-0+0+2+8=22$$

For a 4x4 matrix, there will be 24 rearrangements, like this: (1 2 3 4) -> (3 2 4 1) +1 ... so there will be 24 terms in the expression of the determinant. For a 5x5 matrix there are 120 rearrangements, so there will be 120 terms in the determinant, and so on. For an NxN matrix, there will be N! (N factorial) terms, where factorial means you multiply together all the terms from N down to 1. For example, 5! = "5 factorial" = 5x4x3x2x1 = 120.

Now, Doctor Tom has *declared* that these two definitions are equivalent, but not *proved* it; that fact is demonstrated, for two dimensions, in Polygon Coordinates and Areas, where we derive various formulas, one of which is our determinant. We’ll see a three-dimensional explanation later. For now, we are just trusting that he is right.

Two weeks later, we got a similar question, and Doctor Tom gave a deeper answer, because this was a teacher rather than a student:

Explaining the Determinant I am trying to understandwhat the determinant of a matrix actually is. I have a degree in mathematics and am currently teaching Algebra II to gifted and talented students. I know how tofindthe determinant and how to teach the process of finding the determinant, but I haven't been able to explainwhat it is. Please help. Interestingly, I found in a History of Math text,From Five Fingers to Infinity, that before Arthur Cayley "created" matrix theory, Leibniz and a Chinese or Japanese mathematician simultaneously and independently discovered the determinant.How could the determinant have been discovered before the matrix?I guess I have two questions.

What is it, and how was it discovered without matrices, when the determinant is a property of a matrix? Two good questions!

Doctor Tom answered, starting with the geometric definition we saw above:

Hello Jeremy, I always think of it geometrically. Let's look intwo dimensions, at the determinant of the following: | x0 y0 | = x0*y1 - x1*y0 | x1 y1 | Now imagine the two vectors (x0, y0) and (x1, y1) drawn in the x-y plane from the origin. If you consider them to be two sides of a parallelogram, thenthe determinant is the area of the parallelogram. Well, not exactly the area, the"signed" area, in the sense that if you sweep the area clockwise, you get one sign, and the opposite sign if you sweep it in the other direction. It's just as useful a concept as considering area below the x-axis as negative in your calculus course. Swapping the vectors swaps the sign, in the same way thatswapping the rows of the determinant swaps the sign.

This is an algebraic property of determinants; so the two perspectives are compatible at least in this.

Inone dimension, the determinant is justthe number, but if you "plot" that number on a number line, it's the (signed)length of the line. If it goes in the positive direction from the origin, it's positive, and negative otherwise. Inthree dimensions, consider three vectors (x0,y0,z0), (x1,y1,z1), and (x2,y2,z2). If you draw them from the origin, they form the principal edges of a parallelepiped, and the determinant of: | x0 y0 z0 | | x1 y1 z1 | | x2 y2 z2 | isthe volume of that parallelepiped. In higher dimensions, its just the 4D (or 5D, or 6D ...) signed "hypervolumes" of the hyper-parallelepipeds.

It isn’t quite so clear how a positive or negative signed volume can be identified, but they can.

Taking that as the definition, we can derive algebraic properties, one of which we already saw for two dimensions.

With this view, it's easy to see why the determinant's properties make sense.Swapping two rowschanges the order of sweeping out the volume, and will hence turn a positive volume to negative or vice-versa.

For example, here is the value of the determinant above, with the first two rows swapped:

$$\displaystyle\begin{vmatrix}1&-1&2\\-3&0&2\\4&1&2\end{vmatrix}=$$ $$(1)(0)(2)-(1)(2)(1)\\-(-1)(-3)(2)+(-1)(2)(4)\\+(2)(-3)(1)-(2)(0)(4)=$$ $$0-2-6-8-6-0=-22$$

It’s the negative of the original volume.

Multiplying all the elements of a rowby a constant (say 2) stretches the parallelepiped by a factor of 2 in one direction, and hence doubles the volume.

Here I’ve doubled the length of the second row (vector) in the original determinant, which doubles the volume by stretching it in that direction:

$$\displaystyle\begin{vmatrix}-3&0&2\\2&-2&4\\4&1&2\end{vmatrix}=$$ $$(-3)(-2)(2)-(-3)(4)(1)\\-(0)(2)(2)+(0)(4)(4)\\+(2)(2)(1)-(2)(-2)(4)=$$ $$12+12-0+0+4+16=44$$

We can also multiply a *column* by a constant; here I’ve halved all *z*-coordinates from the determinant above, which halves the volume, back to what it was originally:

$$\displaystyle\begin{vmatrix}-3&0&1\\2&-2&2\\4&1&1\end{vmatrix}=$$ $$(-3)(-2)(1)-(-3)(2)(1)\\-(0)(2)(1)+(0)(2)(4)\\+(1)(2)(1)-(1)(-2)(4)=$$ $$6+6-0+0+2+8=22$$

Adding a row to anotherjustskewsthe parallelepiped parallel to one of its faces, and hence (Cavalieri's principle) leaves the volume unchanged. (If you can't see this, plot it in two dimensions for a couple of examples.)

Here I’ve added the second vector to the first in that last one:

Because it has been skewed parallel to two of its faces, cross-sections parallel to either face (say, the green ones) retain the same areas, and the volume is unchanged:

$$\displaystyle\begin{vmatrix}-1&-2&3\\2&-2&2\\4&1&1\end{vmatrix}=$$ $$(-1)(-2)(1)-(-1)(2)(1)\\-(-2)(2)(1)+(-2)(2)(4)\\+(3)(2)(1)-(3)(-2)(4)=$$ $$2+2+4-16+6+24=22$$

Cavalieri’s principle has been mentioned in Volume and Surface Area of a Sphere – Without Calculus. It is easily understood if you imagine our figure as a stack of cards, and slide them in such a way that they retain the same area and thickness, but are stacked at a different angle. Here is an example of a cross-section of our figure, which has been slid upward parallel to the front edge:

I cut the figure with a light blue plane parallel to the green faces.

Check the other allowed determinant manipulations to see how they relate to the geometry. Because a determinant is a fundamental geometric property of a collection of N N-dimensional vectors,it's not too surprising that different folks would stumble across it, even without knowing what a matrix is.

One place the determinant is used without formally thinking of matrices (specifically, as objects that can be added and multiplied) is in solving a system of equations using Cramer’s rule; the coefficients are already arranged in a rectangle, and their determinant arises naturally in the work of solving by elimination (along with the rules for manipulating them).

We’ve discussed in Polygon Coordinates and Areas how to use determinants to find the area of a triangle, and showed a couple proofs. We don’t have a proof that the determinant actually gives the volume of a parallelepiped as we normally think of it; but we see here that the algebraic properties of the determinant are in agreement with the geometric properties of the volume, supporting the equivalence.

Next week we’ll see a fuller explanation of that volume, before comparing areas and volumes.

]]>

The question is from Conor, in early March:

Is there a formula to determine the

amount of liquid needed for dilution?For example, say I have

500 ml of 75% ABV (150 proof) alcoholand need toadd water to dilute it to 35% ABV(70 proof), is there a simple formula where I can just plug in the numbers to determine how much I should add?

“ABV” means “alcohol by volume”, and “proof” means twice that number (for interesting historical reasons, based on how it was tested).

We would often ask a “patient” to show what they have tried, so we can get a sense of what sort of help they need; but Conor indicated that he was neither a student, teacher, or parent, but “other” – someone who just needs to use the math. So Doctor Rick gave a direct answer, teaching the “how” without withholding details for pedagogical reasons:

Hi, Conor, thanks for writing to the Math Doctors.

I will first show you how we can use a little math to

create our own simple formula. Then I will explainwhy this simple formula is not really accurate, andan accurate formula would not be simple!

Sounds like a plan!

Let’s start with your example numbers and their meaning:

For example, say I have

500 mlof75% ABV(150 proof) alcohol and need to add water to dilute it to35% ABV(70 proof).You have a total of

500 mlof an alcohol-water mixture. It is75% alcohol by volume, so the amount of alcohol in the mixture is0.75 × 500 ml = 375 ml.

You want to add more water so that this alcohol (375 ml) so that the new mixture is

35% alcohol by volume.

Here is a picture, with red representing alcohol, and yellow representing water:

I’ll use a little algebra, which requires that I

give a name to the unknownamount of water: let’s say I will addx ml of water. Then the total volume is (500 + x) ml, but the volume of alcohol is still 375 ml. I calculate the fraction of the new mixture that is alcohol, by dividing the volume of alcohol by the total volume; I want the result to be 35%, or 0.35:375/(500 + x) = 0.35

This is how algebra is typically applied: We write an equation that represents the goal, and then we use algebra to solve it. We’ve seen how to calculate the percentage of alcohol if we knew that unknown amount of water; then algebra unwinds the calculation to find what the unknown has to be.

To solve this equation for x, I first

multiplyboth sides of the equation by (500 + x):375 = 0.35 × (500 + x)

Apply the

distributive propertyto the right-hand side:375 = 0.35 × 500 + 0.35 × x

375 = 175 + 0.35x

Subtract175 from each side:200 = 0.35x

Finally, I

divideboth sides by 0.35, which leaves x alone on one side; the other side is the value of x that makes the original equation true:x = 200/0.35

x = 571.4

So you need to add about 571.4 ml of water to get 35% ABV.

We had to round the answer, which otherwise would have been 571.42857…!

Here is the solution:

This is the sort of algebra students generally learn first.

Let’s

check that answer: after adding that volume of water, the total volume of the mixture is 1071.4 ml, of which 375 ml is alcohol. We calculate the percentage of alcohol:375 / 1071.4 = 0.35 = 35%

It worked!

Or we could just check that 35% of 1071.4 is 375: \(0.35\times1071.4=374.99\), showing that the answer is not exact, but as close as we can get, rounded.

I recommend doing this sort of check whenever you care about getting the right answer. We worked backward to find the answer; now we work forward (which is more natural, and therefore more likely to be done correctly) to see if it works.

Now, I will make this into a

general formula. To do that, I replace the numbers 500 ml, 75%, and 35% with the variablesV(the initial total volume of mixture),_{i}P(initial percentage ABV), and_{i}P(final percentage ABV). If I do all the same things I did with the numbers above, I will get the equation_{f}P

_{i}V_{i }/ (V_{i}+ x) = P_{f}whose solution is

x = V

_{i }(P_{i}– P_{f }) / P_{f}This formula can also be written as

x = V

_{i }(P_{i }/ P_{f}– 1)

Having done the work of solving with numbers, as sort of a dry run, we just have to do the same things with letters (which generally feels a lot less natural). Here is what the work looks like in this form:

$$\frac{P_iV_1}{V_i+x}=P_f$$ $$P_f(V_i+x)=P_iV_i$$ $$P_fV_i+P_fx=P_iV_i$$ $$P_fx=P_iV_i-P_fV_i$$ $$P_fx=V_i(P_i-P_f)$$ $$x=\frac{V_i(P_i-P_f)}{P_f}=V_i\left(\frac{P_i}{P_f}-1\right)$$

One more check to see that this works with your example: replacing V

_{i}with 500, P_{i}with 75, and P_{f}with 35, I findx = 500(75/35 – 1) = 571.4

Good, it worked!

This confirms that the algebra we did with the general formula agrees with the initial work for the specific example.

Now for the

bad news, though it may not betoobad. I made a simplifying assumption in the work above: that when you add two volumes of different mixtures (500 ml of 75% ABV and 571.4 ml of pure water, for instance), the volume of the resulting mixture will be thesumof the volumes I started with. I know this assumption is not always true, so I checked online, and Wikipedia had this to say:Mixing two solutions of alcohol of different strengths usually causes a

change in volume. Mixing pure water with a solution less than 24% by mass causes a slight increase in total volume, whereas the mixing of two solutions above 24% causes a decrease in volume…. Thus, ABV is not the same as volume fraction expressed as a percentage … defined as the volume of a particular component divided bythe sum of all components in the mixture when they are measured separately.

This is a point I commonly make when teaching about this sort of problem in an algebra class: The textbooks typically make the assumption that volume is conserved without stating that it is only an assumption, and is false for many liquids. I tell students to make the assumption the book uses, but not to do the same in a chemistry class!

My work above assumed that ABV

the same as the volume fraction; notice that I added the volume x to the initial volume, assuming that would be the new volume. Now we see that my formula is not accurate. However, the article goes on to say:isThe difference is

not large, with the maximum difference being less than 2.5%, and less than 0.5% difference for concentrations under 20%.So if you are OK with your result being off by as much as a few percent, you can live with the formula I showed you above. If not,

it will take more work involving empirical data, such as the graph of “excess volume” shown in the Wikipedia article. You may be able to find a more accurate formula somewhere online, but it’s beyond the realm of pure math.

Here is the graph from the article:

To use this, it appears that we would need to convert our ABV to moles of alcohol per mole of mixture, and use the appropriate number from the graph to adjust the volume of mixture in our formula. I would also want to check my work against the statements about concentrations above and below 24%.

]]>