First, we’ll look at this question from 1999:

Properties and Postulates When people discover (or create) a property, do they just discover it ONCE and then know that from then on it applies to all similar situations, or do they just happen to keep discovering the property until they decide to call it a property? In other words, how long does it take for something to become a property? COMMUTATIVE PROPERTY: 1. The commutative property of addition seems so intuitive and fundamental - (obviously if you have an apple, an orange, and a lemon in a box, you could also call them a lemon, an apple, and an orange and still be describing the same box) - thatit is almost like a postulate. What distinguishes postulates from properties such as the commutative property of addition? 2. How can people be sure that properties such as the commutative property of multiplication, which are not as intuitive as the commutative property of addition, work in every case? (I think I found a way to prove this property - butI want to know if it is something that even NEEDS to be proven.)

Doctor Ian took this one, first looking at the history question (which, of course, varies a lot):

In theory, once you'vediscovereda property - that is,provedthat some theorem is true - then you never need to discover it again. And anyone who is playing the same game as you (for example, standard number theory) can use your discovery to make more discoveries of his own. But that's in theory.In practice, you would have to publish your result, and other people would have to read and verify it.Gauss made several major discoveries that he wrote in his diary, and which only became known decades after his death, after some of them had been discovered independently by other mathematicians. Similarly, Isaac Newton invented the Calculus in order to prove to his own satisfaction that an inverse square law of force would result in an elliptical orbit, but he didn't tell anyone until Edmund Halley asked him about it many years later. In Germany, Leibniz, not knowing what Newton had done, invented it on his own, using a different notation. The Indian mathematician Ramanujan 'discovered' many results by some intuitive process that no one understood, but which clearly didn't involve the notion of 'proof'. So while he was able to report interesting discoveries, some turned out to be wrong, others had to be proven by mathematicians who couldn't have dreamed them up, and some remain unproven today.

The ideal process, however, is simple:

But these are all exceptional cases. Usually, a property becomes known when a mathematician eitherobserves or guessesthat some kind of pattern exists,provesthat it does, tells other mathematicians about it, and has his proof verified by independent mathematicians. At that point,other mathematicians can treat it as if it were 'obviously' true.

Now, what about that commutative property? First, addition:

You're right that many properties seem so basic that it's tempting to think of them as postulates. But one of the primary differences between postulates and properties is that the number of postulates in a given formal system either stays the same or decreases, while the number of properties continues to grow. Properties are knowledge, and knowledge is power, soyou want to have as many properties as you can find. Butpostulates are assumptions, so you want to have as few of them as you can get away with.That's why, for centuries, mathematicians tried to 'prove' Euclid's parallel postulate using the other postulates as a starting point. And that's why mathematicians find it preferable to prove things like the commutative property of addition, even though from a certain point of view, proving something so obvious seems like a waste of time.

As we saw previously, postulates are the facts you take as your starting point; ideally, they should be minimal, so that as much as possible is proved. Theorems are properties or facts that have been proved from the postulates or other theorems.

Kiki has the common impression that a postulate is any fact that is obvious, so that it doesn’t need proof to be accepted. She sees the commutative property of addition as obvious, but not so for multiplication. But really, both can be demonstrated in almost the same way, by just looking at the same object from two different perspectives:

By the way, I don't agree with you that the commutative property of multiplication is any less 'intuitive' than the commutative property of addition. Visually, you can represent a sum of two numbers like this: +--+---+ |**|***| +--+---+ Flip it around, and you get +---+--+ |***|**| +---+--+ Since it's the same object, the order of the operation can't matter. Similarly, you can represent the product of two numbers like this: * * * * * * * * * * * * * * * * * * Rotate it 90 degrees, and you get * * * * * * * * * * * * * * * Again, since it's the same object, the order of the operation can't matter.

But a *proof* has to be based on previously known facts *within math*, so (a) the demonstrations above can’t be thought of as proofs, making the properties theorems, and (b) both might be thought of merely as justifications for accepting these properties as postulates.

But it's important to remember thata picture isn't a proof. A picture shows that something is true for one particular case, while a proof shows that it is true for all possible cases. When you want to convince yourself of something, a picture is often good enough. But when you want to prove it to someone else - especially someone who might be using it as a starting point for discoveries of his own - you have to meet a higher standard. Also, if you remember how Russell's paradox led to a re-examination of the foundations of mathematics, you'll see that we often learn the most by trying to really understand the simplest cases, rather than the more complicated ones.

We don’t really have any postulates yet on which to build proofs, because the choice of postulates is not a matter of believing what’s “obvious”, but of constructing a system with a minimal set of postulates. Are there any *more basic* ideas from which both of these properties might be derived?

In fact, this has been done. Here are a few answers (all by Doctor Rob) about one well-known set of axioms for the natural numbers, how they are used to prove theorems such as the commutative property, and how to extend that to other numbers:

Proof that 1 + 1 = 2 Proving the Properties of Natural Numbers Real Numbers

These explanations were the basis of further questions in 2003, similar to questions we have previously looked at for geometry:

Flavors of Facts Is it a fact that 1 + 1 = 2? I have seen your proof using the Peano postulate.Is the postulate a hypothesis which is unproven, or is it proven, i.e., a fact?For example, 1 + 1 = 10 in base 2. So is the value of 1 + 1 open to interpretation? I think I find some of the terminology confusing, e.g., what do we really mean by the terms 'fact', 'premise', 'assumption', 'axiom', 'postulate', and so on?

Karen is asking about the set of postulates Doctor Rob had explained (the Peano postulates for the natural numbers), and the resulting theorem that \(1 + 1 = 2\). Does the fact that using a different base changes the result mean that the postulates aren’t universally true? And how are all these different kinds of fact related?

I answered this one:

You'll have to decide for yourself what you mean by "actual fact"; philosophers can have trouble pinning that down! But consider that every thought has to be based on something else; there has to be some starting point, since you reason ABOUT something. So in order to say something is true, you have to believe that its premises are true first. That might be an observation (but how do you know that your observations are true?); or it can be an "assumption", something we take as true as the basis of our reasoning.

This is the same thing we have said about postulates; they state the defining assumptions of a field of math. We “postulate” them (take them to be true for the sake of argument) either because they appear to be basic facts from our observation, or because we just decide to “suppose” them.

That is how we think of math:we choose some set of axioms(or postulates, which are the same thing) and definitions as our starting point, the things we are thinking about. We can choose different axioms and come up with different mathematical systems (such as Euclidean or non-Euclidean geometry). Butonce we choose them, we consider them to be true -- within the particular system we are working on. An axiom or postulate can't be proven, since there is nothing before it on which to build a proof; it stands at the base of the mathematical system that is built on it. So it is an assumption. But that doesn't make it untrue; it is the truest thing there is _within that system_, the basis of the whole construction. Outside of that system, there is really nothing to tell you whether it is true or false. But we are not "assuming" something about some entity outside of the system, that might really be true or false; rather, we are "assuming" something only in the sense of deciding what it is that our system is about.

As we’ve seen before, when we apply our theory to something in the real world, we need some basis for thinking that the postulates apply to that something; but within the math itself, we don’t concern ourselves with that.

But your example of 1+1=10 is not really an illustration of any problem with axioms and assumptions. It is nothing more than notation: the numeral 10 in binary is just a different way to WRITE the number 2. The fact you are stating MEANS exactly the same thing as 1+1=2. On the other hand, there is a system (modulo 2 arithmetic) in which 1+1=0. That is no less true than 1+1=2, within its system; but the meanings of 1, +, and = are different than in normal arithmetic. We are talking about different things, based on different definitions. One is a fact about integers, the other is a fact about modulo-2 numbers. One doesn't contradict the other; they just live in different worlds of thought, which are built on different definitions and axioms.

Modulo 2 arithmetic works with a different kind of “number”; it has different definitions and assumptions than integer arithmetic. There are only two numbers: 0 represents any even number, and 1 represents any odd number. In effect, \(1 + 1 = 0\) means “odd + odd = even”. The addition is different from what we are used to, because we are adding different things than we are used to.

So to answer your basic question, yes, 1+1=2 is a fact--given that 1 and 2 refer to the integers 1 and 2, and that + and = have their normal meanings. All the axioms and definitions on which the real number system are based are assumed when I say that!If you don't make some such assumptions, then "1+1=2" has no meaning; it is just a string of symbols on your computer screen, and can't be said to be either true or false.

Karen wrote back and asked about my use of the words “premise” and “assumption”; I just looked them up in the dictionary, and found that they mean what I meant: a premise is the basis of a logical argument, and an assumption is a fact that is not proved. As I concluded,

They're two halves of the same fact, that something is assumed without proof, so that something else can be proved.

Karen presumably asked her question from the perspective of a student who has only been taught about these facts as “properties”, and is wondering how they fit into the larger mathematical world of axioms and theorems that she has just discovered. The following question, from 2014, takes us to the higher perspective of someone who is learning about “abstract algebra”, which studies things similar to numbers that follow some subset of the rules of real numbers, called “groups”, “rings”, and “fields”. (This includes modular arithmetic, and many other ideas.)

Properties? Axioms? What to Call Characteristics of Field, and When Are the characteristics of fields and rings best referred to as axioms or as properties? This is an odd question, I suppose; but I have looked at several sources -- and have found different answers. It seems like many high school and college math texts begin with the number system, and present these "characteristics" to students right away.A college algebra textbook I have right now, for example, calls these properties.That is the term I use for associativity, commutativity, identity, inverse, and distributivity. On the other hand, theWolfram math website uses the term "field axioms."So, what exactly are they? I never paid much attention to this stuff in the past, but after reading Berlinski's book on "absolutely" elementary mathematics, I found it fascinating. I have no illusions about mastering group/field theory, but this little problem of terminology is like having a stone in my shoe. My guess is that these are properties because they can be proved with mathematical induction. Or not?

I responded:

I think what you are finding is that the same topic can be approached from several different directions: it is not that these are NOT axioms, but rather thatnot everyone needs to talk about axioms. Properties only need to be called axioms when you are taking an axiomatic approach to the subject. Andaxioms are a starting point in developing a mathematical concept abstractly; they tell us what we are working with. In this case, a field is defined as anything that satisfies a certain set of properties, which we call axioms because they are the basis of proofs, and are not proved WITHIN the development of the concept, but are used to prove theorems about these entities. We can abstractly prove theorems that apply to ALL fields by relying only on the axioms that define a field in our proofs.

So, in field theory, we start with a set of axioms that define what a field is; any set of entities that satisfy those axioms is a field, and all theorems about fields apply to it. In proving these theorems, we take the axioms as given.

One example of a field is the set of real numbers.

But when we APPLY the concept of a field more concretely, we show that some particular entity IS a field by proving that it satisfies all the axioms. For example, we show that the real numbers form a field by proving that the axioms are true OF the set of real numbers; for this, we might start with some more fundamental axioms that define natural numbers, and work up to rational and then real numbers by introducing additional definitions and axioms. In this process,we do not think of the properties we are proving as axioms, but just say these are provable properties of the particular field we are talking about. Furthermore, in relatively elementary presentations of algebra -- not abstract algebra, which deals with general entities such as fields, but just working with real numbers and variables -- we don't need to even mention the word "field," or talk about axioms. That would only scare off most students who are not ready for high levels of abstraction.We would just say these are properties of the real numbers, and not bother to prove them. "College algebra" is still elementary in this sense, so they would not need to use the word axiom (unless they choose to mention in a sidebar that these properties apply to other kinds of systems that students will meet later).

So within a course in (real number) algebra, we just talk about properties; in a course of “analysis” we prove these properties; and in a course on abstract algebra, we take the properties as axioms, which can’t be proved (and perhaps use the real numbers as examples).

You might diagram these ideas in this way: ________________________ ( ) ( field ) (________________________) ^ ^ ^ | | | axiom axiom axiom | | | __|________|________|___ | | | | | | ppty ppty ppty | | | | real numbers | |________________________| ^ ^ ^ | | | axiom axiom axiom What we think of as an axiom when we are looking from above, using it as a foundation for an abstract concept, is a property when we look at it from within the study of a particular example.

The idea here is that the concept of field is represented by a cloud, up in the air, abstract, whose properties are defined by the axioms that undergird it; the real numbers themselves have properties that can be proved by their own axioms (as I described above), but which in turn make them a field, so that all theorems proved about fields apply to them.

]]>We’ll first look at a simple version of the question, and then one that takes us deeply into geometry. First, from 2006:

Theorems and Postulates If SSS, SAS, and AAS are theorems, why do other books still use them as postulates? And can you show me the PROOFS that were used for these theorems? :) Sorry, I know you've answered questions about the topic many times but as I was reading the answers I realized that you were trying to say thatSSS, SAS, and AAS are really theorems because they were proved(theorems need proofs). But, why do books and eventeachers still teach students like us SSS, SAS, and AAS "POSTULATES". Do the words "theorem" or "postulate" really matter? I am a high school student studying geometry right now and your answers to my questions would be a great help for me.

Below, we’ll be looking at one of the earlier answers referred to here. But Alona is probably referring specifically to this page from 2001:

AAA, ASS, SSA Theorems Can you please tell me in detail why the ASS, SSA, and AAA postulates can't be used to determine triangle congruence?

Doctor Jubal started his answer by saying,

Just a side note: theSSS, SAS, and ASAtriangle congruency theoremsare theorems, not postulates. A postulate is something you just state and assume to be true. A theorem is something you can prove, based on your postulates.

Evidently Alona’s textbook calls these postulates, while Doctor Jubal’s calls them theorems. So what are they really?

Having already written about this, I gave Alona a succinct response, also referring to a page we looked at last time, and the page we’ll look at below:

These facts CAN be postulates, but they don't have to be. It's a matter of how an author chooses to present geometry to his audience.Different geometry texts choose different starting points.The best way to do geometry is to start with as few assumptions as possible, and prove everything from those.Many texts "cheat" a bitby using as postulates anything they don't want to bother proving (probably because the proofs are difficult and wouldn't really help their students understand the subject).Others use a good, small set of postulates, but state some theorems without proof, explaining that the proof is beyond the level of the text. I prefer the latter approach, but I can understand the "cheating".It is possible to take ONE of these congruence facts as a postulateand prove the others from it (so they become theorems). It is also possible to define congruence in such a way thatall three can be proved from some more basic postulateabout congruence. You can take them however your own text presents them; but be aware that they are all really equivalent facts, and which you take as postulates doesn't affect how you use them, which is what really matters. In other words, in answer to your question as to whether "theorem" or "postulate" really matters: it matters in presenting a specific systematic treatment of geometry, but not in USING the facts you learn, which are true one way or another regardless.

This is the key idea that any student should come away with, and is why I am covering this answer first: Whatever starting point an author chooses, once the theorems have been proved it doesn’t matter whether a given fact was stated as a postulate or as a theorem, it is still equally true.

Alona had a follow-up question:

Dear Dr. Math, thank you very much for answering my past question about "theorems" and "postulates". I now know that one of the SSS, SAS and ASA theorems can be considered as a postulate. It just depends on the starting point of the discussion. But they are really theorems (I hope I understood it the way you want me to understand it). My question is,is it really possible to prove theorems from theorems?What I mean is, is it possible to call all the three congruency theorems "theorems" and still prove each one using each theorem? In our geometry class, it is possible to prove theorems from previous theorems. Thenwhy do we need to assume one of the congruency theorems as "postulate" when we could really prove it using "theorems". Thank you very much for your previous reply. It answered 70% of my questions. These questions are the remaining 30%. :)

I replied, pointing out the necessity for *something* to be a postulate, to avoid circularity:

Certainly you can prove a theorem from a theorem; you do it all the time, I would think. You can use any known fact, whether theorem or postulate, as the basis for a proof.What you CAN'T do is prove A from B, and B from C, and C from A!Such circular reasoning is not allowed, because you have to start with something that is known to be true. So if you call ALL THREE of these "theorems", then at least one of them has to be proved on the basis of something else (such as a definition of congruency that is more powerful than what elementary texts usually use). That's why the best approach is to take one of them as a postulate, and then prove the others as theorems. It doesn't matter which one you start with, but you have to start with one without assuming another is already true. If you do merely prove each from another of them, then what you have done is to show that they are all EQUIVALENT--that is, IF one is true, then they all are. But then either they are all true, OR they are all false. You don't know which! I think the pages I referred you to answer this question, by explaining the role of postulates as starting points. You may want to reread them with this new perspective in mind.

With all this in mind, let’s take a look at that earlier discussion from 2002:

Theorem or Postulate? I am homeschooling my daughter in math. I want to teach her proofs, which have just been removed from the high school curriculum. My problem is that there seems to be a great discrepancy between definitions, postulates, and theorems, from textbook to textbook.What is a postulate in one book is a theorem in the next, and vice versa.Is there a textbook that faithfully follows Euclid's Elements?I have the translation, but the language is quite formal and of course there aren't any problems to work out. Your help would be most appreciated in this matter.

Maureen wants to get it right! In order to help, I dealt with both the general ideas discussed above, and some particulars about differences among textbooks:

It's important to be aware thatthere is no one "correct" axiomatization of geometry; a number of different schemes have been developed by reputable mathematicians, not to mention by textbook authors who are trying to keep things simple for students. Certainly the variety of postulates used in texts makes it hard for me as a Math Doctor to answer questions about proofs, since I have to ask what facts are available to the student; but in a sense that helps in our mission, since our goal is to help in understanding, not to give specific answers. By making the student look for whatever postulate or theorem in their book corresponds to a fact I mention, I help them learn how to put ideas together. That's relevant to you, becauseyour goal is not to teach a particular "correct" set of postulates, but to teach reasoning(using whatever postulates and theorems are available) and geometric facts (all of which are agreed upon, even if we don't agree on which to call postulates and which to call theorems, or on what the theorems are ultimately based on). The educational results don't really depend on the details of the system used.

Mathematicians spend time rethinking known math, looking for new proofs and new ways to organize old information. Euclid’s *Elements*, the original geometry “textbook”, had its faults, and modern mathematicians have searched for better sets of postulates and definitions to start with (more consistent, more complete, and so on), so they typically have longer lists of postulates than Euclid (who sometimes didn’t recognize that he was using a fact he’d neglected to list). Meanwhile, textbook authors want to avoid making the subject too complex, so they try to simplify the list of postulates.

It is definitely not true that Euclid is the one right way to do geometry. Despite the centuries when the Elements were treated as the Bible of geometry, there are manyflawsin his treatment, and manynew ideashave been introduced since then. For example, Euclid does not use the concept of "congruence" as we understand it today. So although going through Euclid can be a very enlightening experience, it need not (and probably should not) be a student's first exposure to axiomatic geometry. So I would recommend not looking for the text closest to Euclid, but for the text that best presents the concepts and demonstrates how proofs work.

Maureen responded with a more specific question about those congruence postulate/theorems:

Thank you for your response. However, I have many more questions.In all the textbooks that I am perusing I notice that there are three triangle postulates - SSS, ASA, and SAS.Shouldn't these be theorems? Are they postulates because the proof is beyond the scope of the high school student, or are they postulates because they cannot be proved?

This is a very perceptive question; you will recall that I said in 2006 that some authors “cheat” by adding postulates just to avoid having to prove them, just as she suggests here. I responded with a short description of different options:

As I said before, there are different axiomatizations of geometry.Some of them prove SSS, SAS, and ASA as theorems,as Euclid does; to do so properly they need a better definition of congruence than Euclid's concept of "superposition," with a clear set of axioms governing the motions that are allowed when one shape is put on top of another.Hilbert's famous axiomatic system takes SAS as a postulate, and derives SSS and ASA from it as theorems. One textbook I have makes ASA and SAS postulates, but proves SSS.Many texts seem to make all three [postulates], probably, as you suggest, in order to avoid confusing students with a rigorous proof when they are not ready for such details. (I would prefer to present a good set of postulates and just say honestly that we are skipping the proof of some of these theorems because they are too difficult to go into yet.) In each case, whatever is chosen as a postulate is such because it can't be proved _from the other postulates that have been chosen for that presentation_. Some of these systems are clearly deficient, but it can be hard to say which is best.

Then I gave a bunch of references to discussions of various systems. Many of these references are now dead links (it’s been 16 years), so I’ll update them here.

First, for information about the benefits and deficiencies of Euclid’s postulates, you can start with the premiere Euclid site, David Joyce’s Euclid’s Elements . Commentaries on each proposition discuss the flow of thought and issues Euclid neglected. For an examination of the book, with a lengthy discussion of its limitations, see Euclid and the Elements.

To compare Euclid’s postulates with a few modern sets of axioms, you can look at Wikipedia on Hilbert’s axioms, Birkhoff’s axioms, and Tarski’s axioms. These are very different in their purpose and form. Many modern school texts, to the extent they use postulates or axioms at all, appear to take ideas from Birkhoff.

For an interesting look at how Euclid can be taught with the benefits of modern thought, see Teaching Geometry According to Euclid by Robin Hartshorne (who clearly disagrees with Birkhoff’s approach).

An excellent reference I gave was to Jim Loy’s site, which unfortunately does not exist now (except in the Wayback Machine). But the page I linked to, which shows how you can start with any of the congruence facts and derive the others (that is, they are all equivalent) was borrowed in this PDF, along with Joyce’s commentary on Euclid’s proof of SAS (which used an “obvious” fact that Euclid neglected to list as a postulate, making his proof invalid).

The next thing on this page is my response to a paragraph that was somehow omitted when this discussion was archived; I looked this up because what I wrote seemed like a non-sequitur. Here is the missing paragraph from Maureen:

As a math major and an algebra teacher I am finding this discrepancy among geometry texts to be a bit of a problem for any student who changes schools mid year. My friend's son has to prove that all right angles are equal.He seems to feel that that is so obvious that it should be a postulate.Euclid's 4th postulate does in fact state that all right angles are equal. However,the congruence postulates aren't nearly as obvious and yet they are postulates. As a math major I took a course in non-Euclidean geometry when I was in college. This course begins with the controversial parallel postulate which Euclid and many others tried to prove. It only became a postulate because Euclid couldn't prove it. I find that I am really surprised to see such an arbitrary treatment.

Here is my response:

It is true that many of the first theorems students are asked to prove are so obvious that they seem not worth proving; but that is only because they are deliberately easy, and are little more than a concatenation of postulates and definitions. That may well leave some students wondering why we need proofs, when they are so trivial. They need to be introduced early to a surprising proof, even if they can't follow it all yet, justso they can see the value of proofs. And they should also be shown a false proof, so they can see the need for care in each step, andwill not think that the "obvious" is always true. The goal of an axiomatic system is to reduce the number of assumptions we make to a minimum (as far as possible) so that all our reasoning is based on readily accepted 'facts'. So postulates should be as 'obvious' as possible; yet the fact that something seems obvious is not enough to make it a postulate, since it may be provable from existing postulates, so that it would be redundant. On the other hand, it is not required that we prove a set of postulates is minimal in order to use them. As the last link I gave above mentions, SSS, SAS, and ASA are all equivalent postulates, so that only one of them need be postulated; but it is common to make them all postulates, andthat is not illegal, just unnecessary. The problem is that this produces a bloated set of postulates and gives a false sense that geometry has to make unnatural assumptions.

Finally,

You will note that although Euclid's fifth postulate is such because it could not be proved from his other postulates, there have been many alternative ways to phrase it in order to make it seem less arbitrary. Any of those versions is a valid postulate.Math is at root a somewhat arbitrary endeavor; there are many ways to choose starting points for the same field.Yet ultimately it makes no difference; who cares whether SAS is a postulate or a theorem, when you need to use it in a later proof? You just write SAS and know that it is true, one way or the other. So although the variations in postulates will make a difference in the details of a student's work, and might cause some confusion if he moves to a different text in the middle of a course, none of the important things is affected: all the same facts are true, and the importance of proof is still being demonstrated.

Let me add here another bit that was not included in the archive, namely an additional answer given by Doctor Fenton focusing specifically on the question of how to teach Euclid:

I'll also throw in my $.02. The American Mathematical Society also publishes a high-school text "Basic Geometry" by George D.Birkhoff, one of the top mathematicians of the Twentieth Century. It isn't a presentation of Euclid, but it isa logically rigorous modern development based on Euclid. There are logical problems in Euclid; he makes implicit assumptions (e.g. about properties of "between", and his first proof is invalid, because he assumes that two circles which appear to intersect do in fact intersect - but there is no postulate guaranteeing that). Birkhoff simplifies Euclid to some extent by introducing Ruler and Protractor Postulates, so that it is not a purely synthetic development. It's inexpensive ($26 for the text, and $9 each for the teacher's manual and answer book). I think it would be worth considering. Another text to consider would be RobinHartshorne's "Companion to Euclid" (if you can find it - it appears to be out of print, but you might find it through online used book sellers) or one of his texts "Geometry" or "Euclid and Beyond". One of those is an updating of the "Companion to Euclid". It may be college level, but it does have problems and explains the shortcomings of Euclid in addition to presenting him.

Hartshorne is one of the authors I referred to above.

The same page contains a follow-up question from a reader in 2003, which I don’t have room for here. It deals specifically with the question of “superposition”, which is the unsupported “fact” Euclid used in “proving” SAS.

]]>I was reminded of this topic by a recent questioner, who said about SSS, SAS, etc., “What I do not understand is why I should believe that these are true. They were never explained to me, just listed as ways to prove triangles congruent.” She then referred to our page

Congruence and Triangles

where Doctor Guy, in 1997, said in passing,

SSS: the letters stand for "Side-Side-Side". What that means is, if you have two triangles, and you can show that the three pairs of corresponding sides are congruent, then the two triangles are congruent.This is a postulate, not a theorem, meaning that it cannot be proved, but it appears to be true so everybody accepts it.

What?! All of geometry is built on statements that we just *think* are true, without proof? I thought math was all about proof and certainty!

There’s a lot more to be said to expand that passing comment, which is technically correct but quite misleading.

We’ll start with this question asked by Julia in 2003:

The Role of Postulates I'm in Euclidean Geometry and the teacher said that theorems are proven; postulates are not. Why? Who decided what were postulates and what were theorems?I asked my teacher if postulates *could* be proven and simply weren't, and she said that they couldn't be proven.This is my current question. Postulates come first, and then theorems are formed from those postulates (right?).So the entire geometry is based on postulates that weren't and can't be proven.That just doesn't seem right to me. Could you explain to me why it's okay that they're not proven?

What it means to say that a postulate can’t be proved is a little subtle; why that isn’t a problem is another question. The answer will take us into the depths of what mathematics is!

I started out this way:

The basic answer to your question is thatwe have to start somewhere. The essence of mathematics (in the sense the Greeks introduced to the world) is to take a small set of fundamental "facts," called postulates or axioms, and build up from them a full understanding of the objects you are dealing with (whether numbers, shapes, or something else entirely) using only logical reasoning such thatif anyone accepts the postulates, then they must agree with you on the rest.

So math is a process of reasoning from some basic assumptions to derive all that can be said about the subject of those assumptions. Those facts we start with are the *postulates* (as they are traditionally called in geometry) or *axioms* (as used in much of the rest of math). But on what grounds are they assumed, and why can’t they be proved?

Now, these postulates may be (and were, for the Greeks)basic assumptions or observationsabout the way things really are; or they may just besuppositions you make for the sake of imagining somethingwith no necessary connection with the real world. In the first case, we want to choose as postulates facts that are so "obvious" that no one would question them; in the second case, we are free to assume whatever we want. In both cases,we want a minimal set of postulates, so that we are assuming as little as possible, and can't prove one from another.

So some math is intended to model the physical world, and we take our starting point there — for example, observations about how lines and points work on a flat surface. Other math is just a “what if” exploration, so we start with a mere supposition and see what would result if it were true. (Sometimes we later find an application, in which it actually *is* true; but that doesn’t change the math itself.) The important thing either way is that we choose our postulates carefully so that we don’t have to assume more than necessary; it should be as easy as possible to decide whether the math we’re doing applies to a given situation.

Euclid's problem was that one of the postulates (the fifth) didn't seem simple enough, so people over the centuries tried to prove it from the other postulates, rather than be forced to accept something that didn't seem immediately obvious. Eventually it was realized that there are in fact different kinds of geometry, some of which don't follow all of Euclid's postulates; and that you could replace his parallel postulate with a contradictory assumption and still have a workable system. In particular, spherical geometry - the way things work on a sphere, if you think of a "line" as a great circle - is a very real example of this, in which parallel lines just don't exist. Spherical geometry follows different rules, yet is just as valid as plane geometry.

The fifth postulate (called the Parallel Postulate) is, as Euclid stated it, “If a straight line falling on two straight lines makes the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet on that side on which are the angles less than the two right angles.” Others since have identified many similar facts that could be put in place of this, such as Playfair’s axiom, “In a plane, given a line and a point not on it, at most one line parallel to the given line can be drawn through the point.” What mathematicians in the 19th century discovered was that it couldn’t be dropped, because it was independent of the other postulates; and replacing it with an alternative (such as having no parallels, or many parallels) did not lead to contradictions.

This example illustrates the fact that postulates are true only in the particular “world” that the math deals with. A postulate can be replaced with another, and the resulting math is still just as valid — it just tells us about a different “world”, such as a spherical one rather than a flat one.

So we have to take as our starting point some postulates that simply define the particular mathematical system we are studying. If we take a different set of postulates, we get a different system, which may be just as useful as the original - and therefore just as "true" - yet different in its conclusions. The postulates we choose are the connection between the abstract concepts about which we are making proofs, and the "real world" ideas that they model (if any). Without postulates, we would not have such a connection, and would be reasoning about nothing!

Note that, since postulates form the “ground floor” of a mathematical system, there is nothing before them from which they *could* be proved! A proof would require going *outside the system*, and therefore would not be a *mathematical* proof. And that is why postulates can’t be proved.

Here is a similar question from 1999, considering not only postulates, but also undefined terms:

Unproven Fundamentals of Geometry I was inspired by some of the answers in your archives to further investigatewhy the fundamentals of geometry are necessarily unproven/undefined. It seems that in every human system of thought discoveries and inventions must be built upon faith. Less vaguely, in geometry, the most basic unit - the point - cannot be defined. What are some other important postulates or axioms that geometry cannot exist without, but cannot prove, either?

Euclid, in his Elements, started with “definitions” that really didn’t define anything in a formal sense; they just indicated what he had in mind. Modern versions of geometry replace these with “undefined terms”, from which other terms are defined, just as theorems are derived from postulates (axioms).

As discussed above, Euclid then listed five postulates, which are assumed to be true.

Is this all just blind faith? Doctor Rick pondered this:

Hi, Han, I like thought-provoking questions like this. I agree with you about the necessity of faith as it relates to our knowledge of and interaction with the real world. In math, though, I see things a little differently.Math in itself is not intrinsically connected to the real world.It is possible, and perfectly okay, to develop a mathematical system that doesn't relate to anything in the real world. It is, as you say, necessary to have "undefined terms" describing entities in the system, and "postulates" (unproven facts relating those entities). But these arenot so much matters of faith as "rules of the game."They are rules that we must adhere to if we are going to prove theorems within the particular mathematical system.

Just as postulates define the subject we are analyzing (such as the way lines interact to form a plane), undefined terms are necessary in order to name what the postulates are talking about. The terms are like the objects used in a game (chessmen, for example), and the postulates are the rules for playing (how a knight can move, how a piece is captured, etc.). All of this may be merely hypothetical (“imagine a world where …”), or applied (“let’s assume the world …”). “Faith” would apply, if at all, only in the latter, not in the math itself.

We aren't allowed to introduce additional assumptions (undefined terms or postulates) or alter them without explicitly stating the new assumptions. When we do so, we are no longer working in the same mathematical system. It may be a perfectly valid system, but it isn't the same one once its rules have been changed, even the slightest bit. Many mathematical systems - probably all until the last two centuries or so - were motivated by attempts to describe and explain things in the real world.At this point, math overlaps with science, and faith becomes relevant.Do the undefined terms and postulates of our system correspond to elements of the real world and their interactions? We can't know. In all likelihood, they don't correspond exactly, but they may make a good approximation.

So “faith” is needed in *applying* the math, either in taking a teacher’s word for it that gravity follows certain laws, say, or in trusting that our experiments reveal truth about the underlying rules of the world. (Generally, what we discover is close enough to use, given the accuracy of our measurements.)

For instance, a "point" in geometry can be thought of as something with no length, width, or breadth. Everything in the real world has some length, width, and breadth; we can only approximate a point by making a dot with the sharpest pencil we can get. (Physicists now think that electrons may actually be points, but electrons obey the laws of quantum physics, which is rather more complicated than ordinary geometry.) Still, somehow, geometry is very useful in describing the real world, even though strictly speaking, it describes things that don't exist in the real world.

As we become able to measure smaller things, we discover that rules we thought we knew aren’t quite exact; but they are still good enough for rough use. When we apply math to the rules we choose, we are determining how things would work in that “world”, which is not quite the real one but is close.

I said that you can change the rules and come up with a new system. Euclid had 5 postulates in his system of geometry. You can see them here, along with his undefined terms (he called them "definitions", but not all of them are) and "common notions" (actually postulates that are more fundamental than geometry): Euclid's Elements, Book I (David Joyce) http://aleph0.clarku.edu/~djoyce/java/elements/bookI/bookI.html The fifth postulate was a lot more complicated than the others. It wasn't very pretty, but it seemed to be needed in order to prove some basic facts about real-world geometry - for instance, that the angles in a triangle add up to 180 degrees. Over the years, people tried to prove the fifth postulate, thinking that something so complex must somehow follow from the simpler postulates. They failed. In the nineteenth century, mathematicians tried a different tack: try changing the postulate, and see what happens. They found that they ended up with several varieties of "non-Euclidean" geometry that were completely self-consistent, but different from Euclid's geometry.Changing the "rules" made a new but perfectly good game.So what do you think happened next? Einstein came along and discovered that these non-Euclidean geometries were just the thing to describe the real-world interactions of objects with mass - that is, to describe gravity. This is a case wherethe mathematical system was invented with no consideration of the real world (and therefore no faith element), but it turned out that this system does appear to describe the real world.The experiments to show that Einstein's theory of general relativity do describe the real world better than any other mathematical system are very tricky; it is still possible that another system would do better. We can be absolutely sure that the results of general relativity theory follow from its assumptions; the only question is whether or not those assumptions match the way the real world is.

Science, as we see, also makes assumptions; its assumptions are about the real world, and can therefore be wrong; the assumptions in math just define what the math is about, and can’t be wrong *until* we try to apply the math to something pre-existing.

Let me put it another way. There are two kinds of truth; I'll call them mathematical truth and real-world truth.Mathematical truthmeans that a statement is consistent with the assumptions of a particular mathematical system. In a sense, people created that system, and they can tell absolutely whether the statement is true within that system. ...Real-world truthis of a different order: it means that a statement is consistent with the particular system that is the real world. There is only one real world, and no human created it; no one knows exactly what the rules are. Scientists try to make rules that seem to describe the real world, but they can't possibly know whether these rules really describe everything in the universe. So yes, faith is necessary, because we did not create the real world, so we can't know absolutely what the rules of this system are ... unless someone from outside this system - the creator of the system - lets us know the rules.

To close out, let’s look at a student’s attempt to define “line”:

Undefined Geometry Terms I know that they call point, line, and plane the undefined terms of geometry, butis there a way to give those terms a definition?I've been thinking, and you may not be able to give all them a definition, but a line could a line be defined as the inconclusive conjunction (or joining) of two rays going in separate directions.I've never really thought that anything couldn't have a definition, so is it possible for any of these geometric terms to be defined?

Jake is wise to suppose only that *some* of the undefined terms could be defined, in terms of others (just as perhaps a claimed postulate might be found to be provable in terms of others, and thus demoted to a theorem). But his attempted definition, which I might refine as “A line is the union of two rays going in opposite directions from the same point, ” proves the point, as I explained:

Your "definition" would require us to first define "ray" and "direction." Can you do that without reference to "point," "line," and "plane"?

A definition has to be given in terms of previously known terms. In the usual presentation, “ray” is defined in terms of “line” and “point”, and “direction” isn’t really given a definition at all. So his attempted definition has to either be circular, or go outside the system.

Think of it this way: Math is a huge building, in which each part is built by a solid line of inference upon other parts below it. What is the foundation? What is everything else built on?There must be some lowest level that is not based on anything else; otherwise the whole thing is circular, and never really starts anywhere. The "undefined terms" are part of that foundation, along with other things like rules of inference that tell us that logic itself is true. The goal of mathematicians like Euclid has not been to make math entirely self-contained, with no undefined terms, but to minimize the number of them so that we have to accept only a few basics, and from there will find all of math to be absolutely certain. Also, the goal is to make those terms "obvious," so that we have no trouble accepting them, even though we can't formally prove their existence.

Interestingly, geometrical concepts have been applied to non-geometrical ideas, by assigning the undefined terms differently: if you take “point” to mean “person”, “line” to mean “family”, and so on, you might define a “geometry of people”, for example. This is another reason for undefined terms; it separates the logic from the application.

To put it another way, these terms do have a definition, in human terms; we can easily understand what they mean. They simply don't have a mathematical definition in the sense of depending only on other previously defined terms.

Next time, we’ll look into the fact that different geometry textbooks list different postulates.

]]>In Monday’s post about fallacies in calculus, one of them used the definition of the derivative (or rather, misused it). Today we’ll look at a short question about applying that same definition, that came in last month.

The question came from Hashem:

We all know what the arcsin(x) derivative is, but the problem is that I’m trying to obtain its derivative by applying the general definition of derivative, rather than the method of implicit differentiation, but I’m stuck and don’t know how to proceed after substituting the arcsin(x) in the definition, although I have searched the internet and found nothing. Finally, if it doesn’t work out, is that means the definition is wrong or something wrong in it? I mean the definition should apply for all functions right?

Please help me out with this problem.

It is common in calculus classes to start out by directly applying the definition of the derivative, \(\displaystyle f^\prime(x) = \lim_{h\rightarrow 0} \frac{f(x + h) – f(x)}{h}\), to several relatively simple functionslike \(f(x) = x^2\); sometimes slightly more complicated derivatives will be determined this way. Subsequently, other methods (derived from the definition) are used to find derivatives, including implicit differentiation. We rarely go back to the definition.

In this case, the usual way to obtain the derivative is to write the definition of the arcsine, \(y = \arcsin x \Leftrightarrow \sin y = x\), and differentiate both sides:

\(\displaystyle \frac{d}{dx}\sin y = \frac{d}{dx}x \Rightarrow \cos y\frac{dy}{dx} = 1 \Rightarrow \frac{dy}{dx} = \frac{1}{\cos y} \Rightarrow \frac{dy}{dx} = \frac{1}{\sqrt{1 – x^2}}\)

Hashem wants to apply the definition directly to this function. Does that definition always work, or does it have “limits”?

Doctor Fenton replied:

Hi Hashem,

No, there is nothing wrong with the definition. The problem is how to show that the limit of the difference quotient in the definition

f(x)-f(a) lim --------- x->a x-aactually exists. With many functions, such as f(x)=Ax+B, this is straightforward, and even with more complicated functions such as f(x)=x

^{2}or f(x)=√(x), there are algebraic identities (factoring and rationalization) that we can use to simplify the difference quotient to a form in which it is straightforward to show that the limit exists, and to evaluate it.With the function f(x)=sin(x), these algebraic simplifications no longer work, but we still know a lot about the sine function, and can use known properties of sin(x) to show that

sin(x) lim ------- = 1 , x->0 xand with other trigonometric identities, this will show that sin(x) is differentiable.

But generally,

you have to know a lot of properties of f(x) to show that the difference quotient has a limit. With functions such as arcsin(x), it is hard to determine the relationship between f(x)-f(a) and x-a, because one of the main ways to find such relationships is by using the derivative, which would make the argument circular.Instead, there are general theorems such as the Implicit Function Theorem, the Inverse Function Theorem, and the (First) Fundamental Theorem of Calculus which can guarantee that under certain conditions a function is differentiable.

If a function of interest can be shown to be differentiable using one of these theorems, you can probably construct an argument based on the argument of the theorem.

As he says, applying the definition becomes much more difficult when a function is more complicated. We know many identities that can be applied to trig functions to simplify a difference quotient and find a limit; fewer identities are *familiar* for the inverse trig functions, and perhaps fewer are even *available at all*. But in principle it should be possible to merge *theorems* about the derivative with the definition, so as to apply the definition in the same way it is applied in the *proof* of the theorem. Now he demonstrates this:

For example, you need to show that

arcsin(x)-arcsin(a) lim ---------------------- x->a x - aexists. If we let t = arcsin(x) and A = arcsin(a), then sin(t)=x and sin(A)=a, and we can write

arcsin(x) - arcsin(a) t - A ----------------------- = ----------------- , x - a sin(t) - sin(A)and the difference quotient on the right should look familiar: it is the reciprocal of the difference quotient for sin(x) at a. You need to show that if x->a, then t->A, which just says that arcsin(x) is continuous. So the limit of the left side as x->a will have the same value as the limit of the right side as t->A, and you know what that limit is. (

This is basically the general argument for the Inverse Function Theoremin one dimension, restated for the specific function arcsin(x), the inverse function of sin(x).)Can you finish the argument from here?

The Inverse Function Theorem says that, under appropriate conditions, \(\displaystyle \left(f^{-1}\right)^\prime (y) = \frac{1}{f^\prime (x)}\). That is, \(\displaystyle\frac{dx}{dy} = \frac{1}{\frac{dy}{dx}}\). It is closely related to implicit differentiation. By making a change of variables from *x* and *a* to *t* and *A*, we end up with a difference quotient whose limit can be found in the same way we do for the sine. So this derivation amounts to intermingling the proof of the derivative of the sine with a proof of the inverse function theorem.

Hashem responded:

Thank you very much Doctor Fenton, and thank you for your efforts, that was very helpful, and I completed the argument, but One final note, you said and I quote “

But generally, you have to know a lot of properties of f(x) to show that the difference quotient has a limit. With functions such as arcsin(x), it is hard to determine the relationship between f(x)-f(a) and x-a“, so my question is: is ithardorimpossibleto determine the relationship? And is there alackorshortagein properties of arcsin(x)? I mean, is there no alternative way in which I don’t do what you did by letting t=arcsin(x) and A=arcsin(a), because in Wikipedia I found this property of arcsin(x) subtraction, https://en.wikipedia.org/wiki/List_of_trigonometric_identities#Angle_sum_and_difference_identities, and so, Is this also don’t work out by any means?

The identities in this list are for differences of *angles*: \(\sin(\alpha – \beta)\), not for differences of *sines*, which is what \(\arcsin(\alpha – \beta)\) is, which would be needed for a fully direct derivation.

Doctor Fenton focused in his reply on the issue of proving an **impossibility**, and on the response to **difficulty**:

Kurt Goedel showed that in any axiom system which is strong enough to allow the construction of basic arithmetic,

there are statements which are true but not provablein the system. When you can’t prove something, it could be because you just haven’t figured out how to prove it, or it may be impossible to prove. Until someone proves (or disproves) it, you can’t know which is true.Computing a derivative directly from the definition can be like trying to climb a 500 meter sheer cliff, but

there may be an easy path to the top if you go around to the back of the mountain. Unless you really enjoy difficult climbing, there is an easier method to get to the top, which may let you attempt more interesting climbs.Computing some derivatives directly using the definition is a very valuable step in learning what being differentiable means, but it is not the only way, and it isn’t necessary to use the definition directly.

Thus, trying for a direct proof of something that can be proved indirectly with ease is not generally worth the struggle. But giving it a try can be educational — if you don’t keep trying too long and exhaust yourself.

]]>In searching for answers to include in Monday’s post on calculus fallacies, I ran across a long discussion that illustrates some important aspects of methods of integration. In particular, there are often multiple ways to find an integral (the best not necessarily being the one taught in your textbook); and different methods will sometimes result in answers that appear to be different from that in the textbook.

We eventually discussed several examples, but we started with one particularly challenging one:

Integrals of the Cosecant, and of the Square Root of a Sum of Squares How does one integrate csc(x) and get the right answer? I searched and came upon this explanation: http://math2.org/math/integrals/more/csc.htm But as a newcomer to trig identities and calculus,how would I ever think to multiply by csc(x) + cot(x)?That Math2.org page understates it right from the start: this "strategy is not obvious"!I did it a different way.Basically, I wrote csc(x) as 1/sin(x), then as sin(x)/sin^2(x), and again as sin(x)/(1 - cos^2x). Here I substituted, letting u = cos(x), -du = sin(x)dx. Next, I plugged in to get the integral of -du/(1 - u^2). Finally, I factored the denominator into (u - 1)(u + 1) and did partial fractions to get an answer of (1/2)ln|cos(x) - 1| - (1/2)ln|cos(x) + 1| This is wrong. Why? Same question for how to integrate sec(x). Why do I have to multiply top and bottom by sec(x) + tan(x) and then do a u-substitution? and how I would I ever know to do that?

Here is what Winnie did, written out:

\(\displaystyle\int\csc x\ dx = \int \frac{1}{\sin x}\ dx = \int \frac{\sin x}{\sin^2 x}\ dx = \int \frac{\sin x}{1 – \cos^2 x}\ dx = \int \frac{-du}{1 – u^2}\) \(\displaystyle = \int \frac{du}{u^2 – 1} = \int \frac{du}{(u – 1)(u + 1)} = \int \left(\frac{1/2}{u – 1} – \frac{1/2}{u + 1}\right)du\) \(\displaystyle = \frac{1}{2} \ln|u-1| – \frac{1}{2} \ln|u+1| = \frac{1}{2} \ln|\cos x-1| – \frac{1}{2} \ln|\cos x+1|\).

The answer on the page she referred to looks very different:

\(\displaystyle\int\csc x\ dx = -\ln\left|\csc x + \cot x\right|\)

Plus the constant, of course. (I’ll come back to their *method* later.)

So her two questions are: (1) How would anyone come up with their method?, and (2) Why is her method wrong? I replied:

Integration is an art, not a routine skill. Some problems can only be solved by using special tricks that you just have to know -- because someone showed it to you -- and some are originally done just by recognizing that you have differentiated something previously that gave, as an integrand, the expression you want.

In other words, integration is much like division — you know that \(56 \div 7 = 8\) largely because you have previously run across the fact that \(56 = 7 \times 8\), so you can recognize the inverse.

Many integrals can be attacked in more than one way -- including csc(x). To integrate it, you don't *have* to use the strategy on Math2.org. The first person to do that most likely did so by working backward from the answer derived via another strategy. Once they had done that, they probably stood back, compared the old and new methods, considered the new one the most direct possible route to the answer, and started telling others about it. It ultimately became the standard, "elegant" way to derive it in textbooks -- but like many things found in math texts, it is the end product of lots of thinking, reflection, and re-thinking. Such a nonobvious method is definitely *not* intended as something anyone would even try their first time through!

In other words, the work we tend to put on display is a polished, final product. We don’t show our rough work. But in teaching, we really need to show how that product is created — just as a sculptor, in teaching how to sculpt rather than just showing off his brilliance, needs to show what the block of marble looks like while it is being worked on.

I do see a way that you might intuit your way to this strategy -- perhaps not when you are faced with this integral, but just when you are in a playful mood (!): If you know that the derivatives of cot(x) and csc(x) are, respectively, -csc^2(x) and -csc(x)cot(x), it might occur to you that the derivative of cot(x) + csc(x) happens to be -csc(x)(csc(x) + cot(x)). If you've got that fact squirreled away in your mind, it might just pop up in a flash of insight.

Good mathematicians are not those who figure things out only when they stare at a new problem, but who are constantly on the lookout for ideas that might be useful in future problems. (The same is true of good authors, or good detectives, or good inventors, or whatever!) After solving a problem, they look back to see if they could have done better, and also to see if there are other problems their discoveries could help with. Here, seeing \(\cot x + \csc x\) hidden inside its own derivative provides a potential trick …

So that is one way the elegant method might have been invented; or it might have been a matter of working backward from the answer obtained a different way. Knowing that the answer is \(\displaystyle\int\csc x\ dx = -\ln\left|\csc x + \cot x\right|\), perhaps by the method we’ll be discussing next, we might just think of getting \(\csc x + \cot x\) into the integrand by multiplying top and bottom:

\(\displaystyle\int\csc x\ dx = \int\csc x\left(\frac{\csc x + \cot x}{\csc x + \cot x}\right)\ dx = \int\frac{\csc^2 x + \csc x\cot x}{\csc x + \cot x}\ dx\) \(\displaystyle = \int \frac{-du}{u} = -\ln|u| = -\ln\left|\csc x + \cot x\right|\)

That is what the math2 site did. But they didn’t invent it themselves; they just passed on a nice trick they had learned.

Now let’s turn to the other question: Why is *Winnie’s* method/answer *wrong*? It isn’t!

Now, as far as your work integrating csc(x) in a "different way" ...your method is not wrong -- nor is your answer!Let's take your answer and show that it is equivalent to theirs: (1/2)ln|cos(x) - 1| - (1/2)ln|cos(x) + 1| = (1/2)ln|(cos(x) - 1)/(cos(x) + 1)| |(cos(x) - 1)(cos(x) - 1)| = (1/2)ln|------------------------| |(cos(x) + 1)(cos(x) - 1)| |(cos(x) - 1)^2| = (1/2)ln|--------------| |(cos^2(x) - 1)| |(cos(x) - 1)^2| = (1/2)ln|--------------| | -sin^2(x) | |cos(x) - 1| = ln|----------| | sin(x) | |cos(x) 1 | = ln|------ - ------| |sin(x) sin(x)| = ln|cot(x) - csc(x)| = ln|csc(x) - cot(x)|

I had a general idea what form of answer I wanted to transform Winnie’s answer to, so I pulled the 1/2 inside the log (as a square root), then expressed the argument of the log in terms of csc and cot. But …

That's not quite where we wanted to go; but rather than go back and do it a little differently, I'm going to let you either make a little change to my work so we end up where we want to be, or start with my end product and take it the rest of the way to the answer you found, namely -ln|csc(x) + cot(x)| One of the beautiful but frustrating things about trig functions is that what at first looks entirely different can turn out to be exactly the same. Trig lets you express the same relationships in so many different ways!

A good argument could be made that my form of answer is nicer than theirs (no negative on the outside); the important thing is that we now have three different answers, which all look different but turn out to be equivalent. This is typical of trigonometric problems. (Hint: one way to convert mine to theirs is to multiply the numerator and denominator by \(\csc x + \cot x\).)

Having said all this, I found the following page in our archive: Integration Trick http://mathforum.org/library/drmath/view/53537.html In that Dr. Math conversation, a student came up with something much like your answer, and Doctor Jerry (a) checked the answer by differentiation, showing it was correct; (b) found yet another form of the answer in a table; and (c) told about the elegant way -- which he recalls being taught, rather than finding on his own! It's a sort of math lore, passed down from one generation to another, rather than a wheel that humans invent and re-invent on their own. Though your method takes more work, it is an excellent one. In fact, if I were writing a textbook, I'd relegate the quick method to a footnote, to show how creative we can be. But in the main text,yours is the method I would use-- the better to show other students that you can find a solution for yourself using ordinary methods, rather than having to solve the whole thing in one super-insightful leap.

At this point Winnie asked for some clarification, which you can read if you wish; then she asked:

In addition, if you wouldn't mind helping me further, I am struggling a bit with trig substitutions. My teacher actually never taught me this method, but I am 100% sure that, on the test, there will be something that requires it. Could you direct me to a good archive or website that thoroughly goes through the general approaches without being too dense? Or could you explain it to me with an example? For starters, here's a question from my review package that I don't get: Find the integral of dx/(sqrt(x^2 + 16)) I thought it would involve tangent at first, but the square root threw me off. Bummer ... Maybe some trig substitution would work?

The specific example here is \(\displaystyle\int \frac{dx}{\sqrt{x^2 + 16}}\). Her impression of the appropriate method is exactly right; she just needs to follow through:

The basic trig substitutions remind me of the Pythagorean identities, insofar as I don't have to specifically remember a table of substitutions. Here's a site I refer to from time to time: Paul's Online Math Notes: Trig Substitution In your case,x^2 + 16 reminds me of tan^2(u) + 1 = sec^2(u). If I divide by 16, I have (x/4)^2 + 1, so I want to let tan(u) = x/4 That is, I replace x with 4 tan(u), so that x^2 + 16 = [4 tan(u)]^2 + 16 = 16[tan^2(u) + 1] = 16 sec^2(u) So sqrt(x^2 + 16) = 4 sec(u) And dx/du = 4 sec^2(u), so dx = 4 sec^2(u) du The integrand therefore becomes dx 4 sec^2(u) ------------ = ---------- du = sec(u) du sqrt(x^2+16) 4 sec(u) From here, we just have to look up the integral of sec(u). So you were right about using the tangent; and the fact that there is a square root isn't a problem -- just something to address after the substitution, once we see how much closer it brought us to an answer. Never be afraid to try something; you don't have to be sure it will work! Each thing you do just makes it easier to see what to do next (even if it's to back up and try something else!).

(If we don’t just look up \(\displaystyle\int\sec u\ du\) in a table (it’s inside the back cover of a book I’ve used), we would do the same sort of thing we did above.)

She wrote back, showing her work of finishing up with this integral; the work looks almost identical to the original work she showed for the cosecant, so I will not repeat it. But she didn’t finish this, by going back to the original variable; and this may have happened because she used the same name for the new variable (which I called u), so that there was nothing to remind her that she wasn’t finished. This is a very bad practice; two things called by the same name within the same problem should be the same thing!

Then, having given up on that, she showed another attempt at the same integral, \(\displaystyle\int \frac{dx}{\sqrt{x^2 + 16}}\), different from the trig substitution we just did:

In addition, I also tried using this substitution: set u = x^2 + 16, du = 2xdx, dx = du/2x. Then x = sqrt(u - 16). Subbing in, we get (1/2)INT[du/(u^(1/2) * (u - 16)^(1/2))] I multiplied same square roots to get (1/2)INT[du/(sqrt(u^2 - 16u))]. But now how do I integrate 1/(sqrt(u^2 - 16u))? I have no idea. Tried writing it as (u^2 - 16u)^(-1/2), but then what? Do I divide by the derivative of u^2 - 16u? But I still wouldn't get my desired answer of ln|x + sqrt(x^2 + 16)|...

I was a little confused here, as to what integral she was referring to. But I did say the right things. First,

The method you show here just doesn't work; it took you to a dead end (or at least to a point where I don't feel like doing the work to get it back on track, if that is even possible). The standard way to deal with sqrt(u^2 - 16u) is to complete the square, but I think that might just get you back to where you started.

Handling dead ends is part of the art of integrating: Even the most experienced sometimes just have to try things, and abandon methods that are not getting anywhere. Experience just lets you recognize more quickly when a method will not work.

Then I finished the work for her:

Looking back, I showed you how to integrate dx/(sqrt(x^2 + 16)), by substituting x = 4 tan(u), turning it into the integral of sec(u). Now you can use what you did above (or, as I suggested, look it up in a table, as this is a standard result that we don't usually reproduce for ourselves each time), finding that it is ln|tan(u) + sec(u)|. [I've been leaving out constants of integration to keep things simple.]Now reverse the substitution: sec(u) = sqrt(tan^2(u) + 1) = sqrt((x/4)^2 + 1) This leads us to ln|tan(u) + sec(u)| = ln|(x/4) + sqrt((x/4)^2 + 1)| = ln|(x/4) + sqrt((x^2 + 16)/16)| = ln|(x + sqrt(x^2 + 16))/4| = ln|x + sqrt(x^2 + 1)| - ln(4) This differs by only a constant that gets absorbed into the constant of integration.

This last point is how this discussion is related to my last post, where I discussed the constant of integration, and how it can make an answer look wrong. We can just drop the “- ln(4)”, because it just changes the value of the constant, whose value doesn’t matter anyway. So we get the book’s answer. The final answer, of course, needs the “+ C ” at the end; I have been omitting it for brevity, keeping in mind that what I write represents an equivalence class of functions.

]]>Of course, these aren't really proofs, because they all have some error in them. What's important about these examples is that they show ways in which you can make a mistake in using math if you aren't careful enough. If you can understand where the error is, then you can look for the same kinds of errors in your own work, whether it's a proof for school or a calculation you make when you're designing a bridge. It also explains why mathematicians and scientists don't publish their results without first having others check them to make sure there isn't some subtle error in their calculations.

Let’s first look at a fallacy in integration, which teaches a very important lesson. Here is the question, from 2001:

1 = 0 Fallacy Reading the Dr. Math pages - and especially the ones on 1 = 0 fallacies - I remembered a 'proof' we ran up against during high school (VWO in the Netherlands). It makes use of integral calculus. We learned the following rule for 'partial integrating': Int(f(x)*g(x))dx = f(x)*G(x) - Int(f'(x)*G(x))dx with G(x): the primitive function of g(x) and f'(x): the derivative of f(x) Now watch the following 'proof': Int(1/x^2 * 2x)dx = 1/x^2 * x^2 - Int(-2/x^3 * x^2)dx (step 1) this yields: Int(2/x)dx = 1 - Int(-2/x)dx = 1 + Int(2/x)dx (step 2) subtracting Int(2/x)dx on both sides yields: 0 = 1 (step 3) Quite remarkable, I think! We found two arguments that possibly explain the fallacy: 1) 1/x^2 * x^2 = 1 is an invalid step 2) we work with unbounded integrals. If we put lowerbound a and upperbound b to the integral, we get for step 2: Int(2/x)dx (a,b) = 1 (a,b) + Int(2/x)dx (a,b) which yields: Int(2/x)dx (a,b) = 1(b) - 1(a) + Int(2/x)dx (a,b) because 1(b) = 1(a) = 1 we get: Int(2/x)dx (a,b) = Int(2/x)dx (a,b) which of course is true. Which of these two arguments tackles the fallacy?

Recall that “integration by parts” uses the formula, \(\displaystyle\int u\ dv = uv\ – \int v du\), or, equivalently, \(\displaystyle\int u\ v^\prime\ dx = uv\ – \int u^\prime\ v\ dx\). It is explained here:

Choosing Factors When Integrating by Parts

What he has done here is to integrate\(\displaystyle\int \left(\frac{1}{x^2}\cdot 2x\right) dx\) by parts, using \(u=\frac{1}{x^2}\) and \(dv = 2x dx\). (In his terms, \(f(x) = \frac{1}{x^2}\), \(g(x) = 2x\), so that \(G(x) = x^2\).) Applying parts, we get:

\(\displaystyle\int \frac{1}{x^2}\cdot 2x dx = \frac{1}{x^2} \cdot x^2 – \int\frac{-2}{x^3} \cdot x^2 dx = 1 – \int\frac{-2}{x} dx = 1 + \int\frac{2}{x} dx\)

But the integral we started with simplifies to

\(\displaystyle\int \frac{2}{x} dx\)

Rather than evaluate this (getting \(2\ln{x}\)), we notice that these two ways of simplifying the integral imply that

\(\displaystyle\int \frac{2}{x} dx = 1 + \int\frac{2}{x} dx\)

Subtracting the integral from both sides, we have 0 = 1!

What went wrong? I replied,

Your second explanation is essentially right. I would say it is an "indefinite" integral, rather than "unbounded." When you work with indefinite integrals, you always have to keep in mind that an arbitrary constant can be added to the result, since differentiation of a constant yields zero. So what you really have is Int(2/x)dx + C1 = 1 + Int(2/x)dx + C2 with a constant added to each side. This simplifies to C1 = 1 + C2 which of course doesn't say much, since C1 and C2 could be anything. That eliminates the problem entirely. Your method of turning the integrals into definite integrals amounts to the same thing; evaluating the constant at the limits makes it disappear, so you can ignore it.

This is a very important lesson to learn; for example, we often see students thinking that their answer for an integral is wrong because it doesn’t match the answer in the book, even after simplifying. The answer may be that the two answers differ by a constant. When this happens, the student ought to notice that the derivatives of the two answers are therefore the same (since the derivative of a constant is zero), so both are valid integrals. Here are some examples of this:

Calculus Constants Constant Oversight

Putting it another way, by parts we got an answer of

\(1 + 2\ln{x} + C\)

while the direct method yielded

\(2\ln{x} + C\)

We just need different values of *C* to make them match.

We got a very similar question in 2003, with an even simpler integral:

Constant of Integration Using integration by parts integration of (1/x)dx = [x * (1/x)]+ integration of (1/x)dx After simplifying by using the addition property of equality and multiplication, the answer would lead to 0 = 1, which should be wrong. The proof seems correct.

That is, taking \(u = \frac{1}{x}\) and \(dv = dx\), we get

\(\displaystyle\int \frac{1}{x}\ dx = x \cdot \frac{1}{x}\ – \int\frac{1}{x} dx\)

As before, subtracting the integral from each side, we are left with 0 = 1. Of course, we know the answer now; Doctor Jacques suggested trying the definite integral approach to clarify what was happening:

This looks like a paradox indeed, but try to see what you get if you evaluate the integral over an interval [a,b]... Please feel free to write back if you are still stuck.

The student did that, but was not convinced about the indefinite form:

The problem is possible in the interval [a,b] but my teacher insists that there is a problem with the proof, while every one of us thinks the proof is correct.

Of course, a proof that 0 = 1 can’t be correct, unless all of math is wrong; presumably they just don’t see where the error is. So Doctor Jacques gave a brief explanation:

The problem is that an indefinite integral (antiderivative) is only defined up to an additive constant. More technically, it is not a single function, but an equivalence class of functions. For example, INT(0 dx) = C, where C is any constant, since the derivative of a constant is 0. In this case, we should have written: (INT{dx/x} + C_1) = 1 + (INT{dx/x} + C_2) and this merely shows that the constants must satisfy C_1 = 1 + C_2. When you compute a "real" integral, i.e. between limits, these constants disappear.

“Defined up to an additive constant” means that answers may differ by that “\(+ C\)” that students are taught to write at the end by rote. So the real answer is not just the function you write, but all functions that can be obtained by using different numbers for the constant — an “equivalence class”.

This still left questions:

As I read your explanation I got confused with the constants. Isn't it that the constant of int(dx/x) on the left-hand side of the equation is equal to the constant at the right-hand side of the equation?

Writing “\(+ C\)” by rote leaves students not really understanding what it is! When I did that above (\(1 + 2\ln{x} + C\) and \(2\ln{x} + C\), using the same name *C* for the constant in each case), I had to consciously remind myself that *C* has to be different in each case. That is not obvious unless you think about it.

Doctor Jacques replied with a deeper explanation and a classic example:

These constants have no actual meaning - they are artificial. When we write INT{f(x)dx} = g(x) we simply mean that the derivative of g(x) is f(x). Of course, the derivative of g(x) + C, where C is _any_ constant, is also f(x). The particular constant that comes out depends on the method of integration. The whole point of the exercise is to show that different calculations can yield functions that differ by a constant. We can even illustrate this with the function 1/x in another simpler way. We know that the "true" integral is ln(x). Now, in INT{dx/x}, if we make the substitution ax = u, with a > 0, you will easily see that the result is ln(ax) = ln(x) + ln(a) = ln(x) + constant and, as we can take any positive number for a, we can make the constant ln(a) anything we wish. There is no contradiction in writing: INT{dx/x} = ln(x) INT{dx/x} = ln(x) + C because these are not true equalities between functions. This is exactly the same as modular arithmetic. When we write 2 = 7 (mod 5) the numbers 2 and 7 are not simple numbers. 2 represents all the numbers that are a multiple of 5 + 2, and 7 represents all the numbers that are a multiple of 5 + 7, and these sets of numbers are the same - that is what the equality means (in this case, we often use a special symbol instead of =, to mean congruence). In a similar way, an expression like INT{f(x)dx} represents, not a single function, but the set of all functions whose derivative is f(x). An equality between integrals is an equality between sets of functions.

If you are not familiar with modular arithmetic, when we write \(2 \equiv 7 (\text{mod } 5)\), it means that 2 and 7 are *equivalent* in the sense that their difference is a multiple of 5. They are both representatives of the same “equivalence class”, which consists of the numbers \(\{\dots , -8, -3, 2, 7, 12, \dots\}\). The same idea applies to the indefinite integral: it is really the *equivalence class* of the function we write, meaning that we can add *any* constant to it and it will still be equivalent. That is what the “\(+ C\)” means.

Let’s move on to the other half of calculus. This question, from 2000, is a classic fallacy using differentiation:

Proof that 2 Equals 1 Using Derivatives How can this be? kx = x + x + ... + x (k-times) ......................[1] xx = x + x + ... + x (x-times) ......................[2] x^2 = x + x + ... + x (x-times) ......................[3] dx(x^2) = 2x (diff. wrt x) ...........................[4] dx(x + x + ... + x) = 1 + 1 + ... + 1 (x-times) ......[5] so 2x = 1 + 1 + ... + 1 (x-times) {from eq. [4],[5]} ....[6] so we have 2x = x ................................................[7] so 2 = 1 (x <> 0) ......................................[8] Thank you!

Akram starts with the (debatable!) fact that multiplication means repeated addition, letting x itself be the multiplier in order to get the square. Then he differentiates both sides of \(x^2 = x + x + \dots + x\) to get \(2x = x\), so that (dividing by *x* if it is non-zero), 2 = 1.

Alternatively, at the last step, one could “solve” \(2x = x\) by subtracting *x* from both sides, yielding \(x = 0\): that is, every number (since x was unspecified) is equal to zero. That would include 1 = 0.

What went wrong?

First, as I read it, when he differentiated (using a nonstandard notation “dx” apparently meaning “d/dx”) \(\underbrace{x + x + \dots + x}_{x\text{ times}}\), he just differentiated each x to get 1, without considering that the number of terms is not constant. If you try to justify this by going to the definition of the derivative, you have to take the difference \(f(x + \Delta x)-f(x)\), which here becomes the difference of sums of different numbers of terms. Doctor Rick took it from there:

In taking the difference, you forgot that not only has each term changed its value, but also the NUMBER of terms has changed. Let's put in some numbers to make this clear. Let x = 3 and delta(x) = 1. Then: x^2 = 3 + 3 + 3 (x+delta(x))^2 = 4 + 4 + 4 + 4 delta(x^2) = 1 + 1 + 1 + 4 We still don't have the 2x that you expected; we've got 7 instead of 6. Why is this? You forgot something else. 2x is the DERIVATIVE of x^2 - the limit of delta(x^2)/delta(x) as delta(x) approaches zero. But the function as we have defined it (as a sum of x terms) has meaning only for integer values of x, so delta(x) can't be less than 1. The derivative is not defined. All we can define is a DIFFERENCE, as I have done (with delta(x) = 1, the smallest possible value), and this is not equal to 2x. You can read more about this by going to our Dr. Math FAQ on "False Proofs, Classic Fallacies", linked on our main FAQ page: http://mathforum.org/dr.math/faq/faq.false.proof At the bottom there is a link to "derivatives," an item in our archives that is directly related to your problem.

The reference at the bottom is to this answer by Doctor Rob:

Derivatives

Really, it’s hard to write something sensible when you try to see what is happening with specific numbers! The notation \(\underbrace{x + x + \dots + x}_{x\text{ times}}\), for the specific numbers 3 and 4 has to mean \(\underbrace{3 + 3 + 3}_{3\text{ times}}\) and \(\underbrace{4 + 4 + 4 +4}_{4\text{ times}}\), respectively. But to take a derivative you have to be able to let *x* be 3.001, for example, as you take the limit; and it makes no sense to repeat something 3.001 times. The fact is that multiplication can only be thought of as repeated multiplication for whole numbers, so the foundation of the argument is faulty.

A reader in 2001 asked for a further explanation:

In the explanation you gave to show why the proof was wrong, x^2 = 3 + 3 + 3 (x+delta(x))^2 = 4 + 4 + 4 + 4 delta(x^2) = 1 + 1 + 1 + 4 what is delta(x^2)? There's a delta(x) = 1, and x = 3, but I thought delta(x) was a term of its own, not a function of x.

Doctor Rick clarified:

I took some shortcuts in my explanation, trying to correct the writer's notation without changing it too much. I'll go through it in a different way for you. We're interested in finding the derivative of the function f(x) = x^2 using the definition of x^2 as a sum of x copies of x. The claim was that you can differentiate f(x) = x + x + x + ... + x (x times) by taking the derivative of each term and adding: df(x)/dx = 1 + 1 + 1 + ... + 1 (x times) = x In my explanation of why this is wrong, I can't really talk about derivatives, because the function has been defined only for integer values of x. Therefore I wrote in terms of finite differences: delta(x) is a finite (integer) change in x, and delta(x^2) is the change in x^2 due to this change in x. Normally we would use the Greek capital delta, and drop the parentheses around the x. You're right, it's not a function. Delta(x^2) is defined formally as follows: delta(x^2) = f(x+delta(x)) - f(x)

That is, the derivative is defined as \(\displaystyle\frac{df(x)}{dx} = \lim_{x\rightarrow\Delta x}\frac{f(x + \Delta x)-f(x))}{\Delta x}\). In this case, \(\displaystyle\frac{dx^2}{dx} = \lim_{x\rightarrow\Delta x}\frac{(x + \Delta x)^2-(x)^2)}{\Delta x}\). Our \(\Delta x^2\) is the numerator, \((x + \Delta x)^2-(x)^2)\).

You might prefer it if I talk in terms of independent variable x and dependent variable y: [1] y = f(x) = x + ... + x (x times) Let's remain general rather than choosing delta(x) = 1. For any value of x and any change in x, delta(x), we can evaluate f(x+delta(x)), which will differ from y = f(x) by an amount delta(y): [2] y + delta(y) = (x+delta(x)) + ... + (x + delta(x)) (x+delta(x) times) Subtract [1] from [2] to get delta(y): delta(y) = delta(x) + ... + delta(x) (x times) + (x+delta(x)) + ... + (x+delta(x)) (delta(x) times) It's that second line that the writer ignored. Applying the "definition" of (integer) multiplication, we get delta(y) = x*delta(x) + delta(x)*(x+delta(x)) = 2x*delta(x) + delta(x)^2 Then delta(y)/delta(x) = 2x + delta(x)

Written out, he has said that \(\Delta y = \underbrace{(x + \Delta x) + (x + \Delta x) + \dots + (x + \Delta x)}_{x + \Delta x\text{ times}} – \underbrace{(x + x + \dots + x)}_{x\text{ times}}\) \(= \underbrace{\Delta x + \Delta x + \dots + \Delta x}_{x\text{ times}} + \underbrace{(x + \Delta x) + (x + \Delta x) + \dots + (x + \Delta x)}_{\Delta x\text{ times}}\).

All this is done, of course, ignoring the fact that \(\Delta x\) has to be an integer, so we can’t really take the limit.

If our function were defined on all real numbers rather than just integers, we would find the derivative by taking the limit of delta(y)/delta(x) as delta(x) approaches zero. The delta(x) term would go away, and we'd get the correct derivative. As it stands, you can see why (in my example in the original explanation) I got a difference of 7 instead of 6: there's an extra term delta(x)^2 = 1. Another correspondent suggested that we "define" multiplication for non-integers like this (in my own notation): x*y = x + ... + x ([y] times) + x*(y-[y]) where [y] is the greatest integer less than y. It's not very helpful, because it only defines multiplication by a number greater than 1 in terms of multiplication by a number less than 1 (namely, y-[y]). However, it does allow us to take delta(x) to zero. If you work through it, you'll find that the derivative works correctly.

Admittedly, much of this is really nonsense. But sometimes it is useful to examine nonsense to see why it is.

Incidentally, we have received at least two dozen questions equivalent to this one (going only by the number of times we referred to this answer). So this is not a rare issue!

]]>We usually look here at problems or concepts that are relatively basic and generally applicable; that could give a wrong impression of the kinds of questions we get. Here I want to show a recent example of a discussion about a problem, related to a geometric figure called the arbelos, that is challenging though it does not require advanced knowledge. This illustrates how an interesting problem can stimulate thought and discussion beyond just solving it, and how a mix of individual problem-solving effort and research of known work can be illuminating.

Here is the question, from earlier this month:

Find the distance between the centres of blue circles.

Doctor Rick, as usual, asked to see Suyash’s work and ideas, in order to know what sort of help he needs; our goal is to help students be able to solve problems themselves, not just to give answers. He added,

In particular, I have no idea what sort of math you have been learning. Personally, I would tackle this as an analytic-geometry problem, defining a coordinate system (perhaps with the origin at the center of the large circle) and writing a set of equations to be solved for the radii and the coordinates of the centers of the two smaller circles. This might get a bit tedious, but it should be straight-forward and surely will work. Is that something you’d do, or do you think another method would be expected?

I find this to be a good way to start an interaction, giving some ideas while waiting for details from the student. Suyash responded,

Thanks for your help sir. Now I have solved this question by using simple geometry and Pythagoras Theorem. Thanks for replying. I appreciate it.

We could have left it at that, but the problem was too interesting! Doctor Rick, having worked on the problem while he waited, answered,

If you constructed lines parallel to the lines in the figure in order to apply the Pythagorean theorem, your classical geometrical solution may have been essentially the same as the analytic-geometry approach I mentioned. If your approach didn’t involve finding the distances of the centers of the two small circles from the horizontal and vertical lines, then you may have a much nicer solution, and I’d be interested to see it. The fact that the two circles have the same radius, and that it is a very simple number, suggests that there may be an elegant solution.

He was doing something we recommend to students who want to get the most out of a problem: looking back after solving it, to see if there is a nicer alternative method. When an answer is simpler than the method of solution, it often suggests that there is more to learn! Suyash was willing to continue the discussion:

Here’s my solution, we could also directly use distance formula to find the required solution instead of doing congruency.

(I did some extra processing on his photo, to make it easier to read.)

He has skipped some steps; this is really a summary of key steps in the work, so it takes a little work to follow what he is doing. This is a good time to practice reading math slowly, figuring out the reason for every symbol he writes! But it is very well written.

(If you need help understanding a step, feel free to write with a comment or question! There are several points that are worthy of discussion; one, the simplification of a nested radical near the bottom, will have to be a subject of a post soon.)

Doctor Rick responded with some comments:

Thanks, Suyash. That’s very good! I like the way you chose to solve a more general problem, with a and b rather than the specific numbers given. Because you did that, we have an answer to the question, “Is it just coincidence that the two circles have the same radius?” I notice that the formula you obtained can be expressed in the neat form

1/r = 1/a + 1/b

and the symmetry on interchange of a and b is sufficient to show that both circles have the same radius.

This elicited a further question — good math is always worth discussing!

Thanks a lot. I think it’s due to the symmetric formula which we obtained, expressed by you in a very neat manner i.e. whatever value of a and b we take, the radius of both circles remains the same. What’s your thought in this matter: can it be proved that both radius are same without finding the general formula? Thanks for your help I appreciate it.

Doctor Rick did a little research:

Hi, Suyash. I’d say that in order to prove that the radii are the same regardless of the location of C on the diameter AB, we need to define variables; the thing to be proved is inherently a generalization, so whatever we come up with is, in a sense, a “general formula”. Perhaps you mean, though, whether we can prove the result without algebra, in a way that the ancient Greeks might have done it?

Recalling that the shape of the region above semicircles AC and CB, and below semicircle AB, is called an Arbelos, I searched for figures like yours and discovered that what we’re talking about is Proposition 5 of the Book of Lemmas, attributed to Archimedes:

Proposition 5:

Let AB be the diameter of a semicircle, C any point on AB, and CD perpendicular to it, and let semicircles be described within the first semicircle and having AC, CB as diameters. Then if two circles be drawn touching CD on different sides and each touching two of the semicircles, the circles so drawn will be equal.So there must be a classical-style, non-algebraic proof of the theorem. I have seen the two equal circles referred to as “Archimedes’ Circles” or “Archimedes’ Twins”. I haven’t found a copy of the proof from the book yet, and I haven’t tried proving it myself in a more classical framework. Of course, the way they proved this might turn out to be equivalent to what we have done.

Subsequently,

I found an English translation of The Book of Lemmas online. It’s part of the complete works of Archimedes; Proposition V is on page 305:

The Works of Archimedes (Google Books)

I haven’t gone over the proof thoroughly, but you will see that the end of the proof says,

AC . CB = AB . HE

In like manner, if d is the diameter of the other circle, we can prove that

AC . CB = AB . d

Therefore d = HE and the circles are equal.

In other words, he derived the formula d = ab/(a + b), showing that it holds for both circles. He did essentially what we have done in broad outline, but he does not invoke the Pythagorean theorem at all. Interesting.

In the past, we have discussed the arbelos a couple times by name (and probably many other times without naming it):

Circles within a Circle Arbelos Construction

Suyash replied,

Well, thanks for illuminating me with the classical Archimedes circle. I really appreciate your efforts towards solving my each and every query. Seems like this case is closed now and we have done all the required work to find the solution of the problem. Well, thanks a lot for guiding and helping me throughout the course. I appreciate your effort and time and would share how helpful this site is. Cheers!

Doctor Rick said something that is true for all of us:

You’re welcome, Suyash. We put in extra effort because it’s fun! I learned some things in this process.

]]>

An interesting question came to us in 2016, where rather than using a well-known formula, it was necessary to work out both what data to use, and how to calculate the desired radius.

The question initially was vague:

Compass, Ruler, and Radius — of a Sphere Using only a compass and a ruler, how can we calculate the radius of a hollow sphere? Obviously it is very easy, but I don't know how to even start! I guess we draw some random circles of equal radius on the sphere surface, but then what? I don't know what to do.

I could think of many possible ways to do this, with various levels of accuracy, but I started with a mix of directness and humor:

Hi, Kyriakos. I see nothing in the problem as stated that indicates it is "obviously very easy." What leads you to come to that conclusion? I imagine there are many methods you could pursue, depending on the precision desired, the kind of compass, and what you are willing to do with it. For starters, some compasses can be opened out enough to be able to use them as calipers: https://en.wikipedia.org/wiki/Calipers (I don't suppose you could just use the point of the compass to threaten bodily harm to the person who made the sphere if he doesn't tell you its radius. "Lateral thinking" puzzles work something like this, what with joke answers for finding the height of a skyscraper using only a barometer, or some other unlikely tool.)

The reference to a barometer is discussed here. We had previously answered serious questions about finding the radius of a basketball (without restriction on tools),

Finding the Radius of a Sphere

and of a tennis ball,

Radius of a Tennis Ball

In the latter, I suggested a way to do it using three rulers. But clearly something more specific is in view here. I continued:

My first serious thought involved using the compass to draw on the sphere: draw one circle on it with a known "radius" (a chord of the sphere); then put the compass point on that circle and draw another circle with the same "radius"; then use the compass to measure the straight-line distance between the two intersections. From that, you could in principle calculate the radius. But I don't want to work out a formula without knowing that's what I want to do.

This turns out to be the method I will be using; but the initial question is open enough to keep thinking. I might be missing something really easy; or there might be more to the rules of the game.

An easier way, though perhaps a little less exact in principle, is to measure the actual diameter of a drawn circle by putting the point on the circle and finding the greatest distance to a point on the other side of the circle. (This is something like the caliper idea, but doesn't require finding the exact opposite point on the sphere.) The "radius" is AB and the "diameter" is BD in the side view below: *********** **** **** *B** **** **/| ** * / | * * / | * * /r |d/2 * * / | * */ | * */ | * A-------C + * * | * * | * * | * * | * * | * * | * ** | ** *D** **** **** **** *********** From these measurements, there are several ways to calculate the radius R f the sphere, such as trigonometry or similar triangles.

If I had continued with this thought, I would have added two radii to the figure and used the Pythagorean theorem:

*********** **** **** *B** **** **/|\ ** * / | \ * * / | \ * * /r |d/2 \R * * / | \ * */ | \ * */ | \ * A-------C-----------O * * h | R-h * * | * * | * * | * * | * * | * ** | ** *D** **** **** **** ***********

In triangle ABC, we have (defining c = d/2 to avoid fractions) \(h^2 + c^2 = r^2\), and in triangle OBC, \((R-h)^2 + c^2 = R^2\). Solving the first for h and substituting in the second to eliminate h, we end up with \(\displaystyle R = \frac{r^2}{2\sqrt{r^2 – c^2}}\); replacing c with d/2 and simplifying, \(\displaystyle R = \frac{r^2}{\sqrt{4r^2 – d^2}}\).

But we needed more information before pursuing any particular method.

Pending more details from you, I think that leads to a decent answer, but I don't know that there isn't a much easier answer, perhaps even an "obvious" one. [One is to "rock" the ruler along the sphere to measure the arc length AB, then use that and r, but that requires numerical approximation rather than a formula.] So, can you tell me where the question came from?

The reply added more rules, and a little context:

Many thanks for your reply! To start with, I found this question on a Greek blog of math problems and riddles. I deduced it must be easy because several people had already claimed solutions. But they did not make these solutions visible, which only made my friends and me more curious. If you can work further on the first method, the one with the two intersecting circles, I would be grateful! For the other one, we must accept some approximation.

“Easy”, of course, is in the mind of the beholder.

And either method really requires approximation in practice; but the second does require a way to find opposite points on the circle, so although it might turn out to be just as accurate in practice, we are evidently looking for a theoretically precise answer. (I suspected there were also some hints as to the expected method, that I wasn’t privy to. Sometimes I will search for a problem across the web in hopes of finding the original wording, but this was in Greek so I couldn’t do that.)

I said that, but Kyriakos reported explicitly this time that my first method was what he wanted:

It seems that the two-circles method leads to a result that makes sense. Draw two overlapping circles of the same radius r. (Compared to the sphere radius, this has to be relatively small). Construct a segment to connect the two points where these intersect. Call its length "L," and measure this with the ruler. Then the sphere radius is (r/2)*sqrt{(4*r^2 - L^2)/((3*r^2 - L^2)} But I cannot prove this result, so I would appreciate any assistance in explaining it. My friend, who gave me the original question, received this solution (or rather "reply") from someone else. It did not come with any explanation; and furthermore, we don't even know if it is correct (only by intuition).

(Of course, you can’t really measure the segment directly with a ruler; he means to transfer the distance to a ruler using the compass.)

So I dove into the harder but more interesting method, now seeking to derive the reported formula:

I've been too busy to take the time to try to derive the formula until today, but this morning I managed it in perhaps ten minutes of free time. Then I had to write it up carefully.... Here is a picture: We have a sphere with radius R, center A, and two circles with radius r, centers B and C, that intersect at D and E. The distance DE is d. We want first to express d in terms of R and r, and then to solve for R. First consider isosceles triangle ABC, with legs R and base r. We conclude that the altitude of this triangle, from A to the midpoint of BC, is H = sqrt(R^2 - (r/2)^2) = sqrt(4R^2 - r^2)/2

For clarity, r is not really the radius of the circles (whose centers are inside the sphere), but the length of a chord from the “center” B or C to any point on the circumference. Likewise, DE is a straight line distance measured through the sphere. F is the midpoint of BC, and G is the midpoint of DE, both in the interior. Distance H is |AF| in the figure. I used GeoGebra to make the figure.

In the next figure we have a cross-section through ADE; as seen here, the two circles look like the same ellipse, and B and C on the sphere coincide with F. But all the lines shown are coplanar. We first look at triangle ADF (which is not isosceles):

Now consider the plane of triangle ADE: In triangle ADF, DF is the altitude of the equilateral triangle BCD, h = r sqrt(3)/2, and AF = H = sqrt(4R^2 - r^2)/2. By the law of cosines, cos(DAF) = [R^2 + H^2 - h^2]/[2RH] = [R^2 + (4R^2 - r^2)/4 - 3(r^2)/4]/[2R sqrt(4R^2 - r^2)/2] = [R^2 + R^2 - (r^2)/4 - 3(r^2)/4]/[R sqrt(4R^2 - r^2)] = [2R^2 - r^2]/[R sqrt(4R^2 - r^2)]

Next, we look at triangle ADG, which shares the angle DAF, and then use the previously determined cosine to find an expression for d:

In right triangle ADG, where G is the midpoint of DE, we have sin(DAF) = (d/2)/R Now, d = 2R sin(DAF) = 2R sqrt(1 - cos^2(DAF)) = 2R sqrt(1 - [2R^2 - r^2]^2/[R sqrt(4R^2 - r^2)]^2) = 2R sqrt(1 - [2R^2 - r^2]^2/[R^2 (4R^2 - r^2)]) = 2R sqrt(R^2 (4R^2 - r^2) - [2R^2 - r^2]^2]/[R^2 (4R^2 - r^2)]) = 2R sqrt([4R^4 - r^2R^2 - 4R^4 + 4r^2R^2 - r^4]/[4R^4 - r^2R^2]) = 2R sqrt([3r^2R^2 - r^4]/[4R^4 - r^2R^2]) = 2 sqrt([3r^2R^2 - r^4]/[4R^2 - r^2])

We have now expressed d in terms of r and R; now we solve for R, as planned:

Solving this for R, d^2 = 4[3r^2R^2 - r^4]/[4R^2 - r^2] d^2 [4R^2 - r^2] = 4[3r^2R^2 - r^4] 4d^2R^2 - d^2r^2 = 12r^2R^2 - 4r^4 4d^2R^2 - 12r^2R^2 = d^2r^2 - 4r^4 [4d^2 - 12r^2]R^2 = d^2r^2 - 4r^4 R^2 = [d^2r^2 - 4r^4]/[4d^2 - 12r^2] = r^2[d^2 - 4r^2]/[4(d^2 - 3r^2)] R = r/2 sqrt([d^2 - 4r^2]/[d^2 - 3r^2]) = r/2 sqrt([4r^2 - d^2]/[3r^2 - d^2]) And that's your formula.

Formatted nicely, the formula is \(\displaystyle R = \frac{r}{2} \sqrt{\frac{4r^2 – d^2}{3r^2 – d^2}}\).

I’ve played with this and see no nice, “obvious” geometrical meaning for it. But one thing we can do to check it out is to try some special cases.

First, what if the sphere is really a plane (that is, the radius in “infinite”)? Then we just have the familiar figure of two overlapping circles in a plane, and can easily calculate that \(d = r\sqrt{3}\) (twice the altitude of an equilateral triangle). Putting that into the formula, \(\displaystyle R = \frac{r}{2} \sqrt{\frac{4r^2 – 3r^2}{3r^2 – 3r^2}} = \frac{r}{2} \sqrt{\frac{r^2}{0}}\) which is undefined as expected.

Second, what if each circle were a great circle? Then \(r = R\sqrt{2}\) and \(d = 2R\). Putting those into the formula, we get \(\displaystyle R = \frac{R\sqrt{2}}{2} \sqrt{\frac{8R^2 – 4R^2}{6R^2 – 4R^2}} = \frac{R\sqrt{2}}{2} \sqrt{\frac{4R^2}{2R^2}} = \frac{R\sqrt{2}}{2} \sqrt{2} = R\) as expected.

]]>Let’s start by taking a historical look:

History of Circle Area Formula Do we know who figured out that pi r squared is the area of a circle? I can find out about the history of Pi and the circumference of a circle, but not its area. I looked through your FAQs and on Google but to no avail. Perhaps it is just not known?

This was a tricky question to answer, because the very idea of formulas came long after the first people to find areas. I pointed to an early analogue of the formula:

It's hard to answer that question, because the area of a circle was known long before pi was actually used. Proposition 2 of book XII of Euclid's Elements, which was undoubtedly known before Euclid himself, isequivalent to the formula A = pi r^2: Euclid's Elements Book XII, Proposition 2 http://aleph0.clarku.edu/~djoyce/java/elements/bookXII/propXII2.htmlCircles are to one another as the squares on their diameters.That is, the area of a circle is proportional to (2r)^2, which in turn is proportional to r^2. All that is lacking here is a name for the constant of proportionality, which has been called pi since 1706.

I could also have mentioned an ancient Egyptian method that comes a little closer to being a formula, which I find described here and here, taken from the Rhind papyrus. (We had answered a question about it here.) As a formula, we would express it as \(A = \left(d – \frac{d}{9}\right)^2\); in words, as translated in the first reference, it looks like this:

Example of finding the area of a round field with a diameter of 9 khet. What is its area?

Take away 1/9 of its diameter, namely 1. The remainder is 8. Multiply 8 times, making 64.

Therefore the area is 64 setjat.

(Early math was described by example like this, rather than being stated as a general formula.)

But I moved on to the recognition that the number used is the same *pi* that is used to find the circumference:

There are two parts to your question: who discovered that the area is SOMETHING times the square of the radius (for which the answer is whoever gave Euclid his proof, commonly considered to be Eudoxus); and who discovered that the constant of proportionality is pi. The answer to the latter question is Archimedes. The form in which Archimedes stated it was thatthe area of a circle is equal to that of a right triangle whose base is the circumference of the circle, and whose height is the radius of the circle.That is, A = 1/2 (2 pi r) r = pi r^2 in modern terms. So except for the lack of algebraic notation and a name for pi, he got the entire formula. You may be aware that he also worked out the value of pi.

This is really somewhat remarkable; pi kills two birds with one stone! I gave a reference to the proof of this fact; we have also discussed it on our site, here:

Archimedes and the Area of a Circle

We have discussed various derivations of the area formula many times; I will show a few different ways here. First, from 2000:

Deriving the Area Formula for a Circle Why is the area of a circle the square of the radius times pi?

Doctor Floor answered this time, including a picture:

Let's consider a circle with radius r. If we divide the circle into an even number of sectors, we can rearrange these sectors as in the following figure: The result is a sort of wrongly formed rectangle, but we know that the shorter "side" of this rectangle has length r, and that the longer "side" is half the perimeter, hence pi*r. The more sectors we make, the more accurate our rectangle becomes. We can imagine that if we divided the circle into an infinite number of sectors, it would become a rectangle.

All derivations of this sort that we give fall short of being actual proofs, because we have to imagine what would happen for infinitely many pieces. Ultimately, this requires calculus to make it rigorous, though early versions (such as Archimedes’) used it in a disguised form. Our goal here is just to see that it makes sense. In the picture above, with more pieces, the sides would slope more and more steeply, approaching the vertical, and the top and bottom would become more and more flat, approaching horizontal lines. The area then will be the height (r), times the base (half the circumference):

Whatever the number of sectors we use, the "side" lengths will remain r and pi*r. Therefore our limit case with an infinite number of sectors still has sides r and pi*r, and the area of this limit rectangle is pi*r^2. Since the area of the circle does not change when we divide it into parts, the area of the circle must have been pi*r^2, too.

Doctor Dotty gave a longer version of this demonstration here:

Circle Formulas: Area and Perimeter

A similar demonstration can be done without making a rectangle:

Formula for the Area of a Circle How do you get the area of a circle? I haven't figured any of it out, but I want to know how to do it. Please help.

I started by showing the formula:

Hi, Kismet. I'm not sure whether you're asking for the formula for the area of a circle, or for an explanation of how it works. I'll give you both. The formula is very simple: A = pi * r^2 which means the area is Pi (3.14159...) times the square of the radius. In a book it would look more like this: __ 2 A = || r To use this formula, just measure the radius of the circle (which is half the diameter), square it (multiply it by itself), and then multiply the result by 3.14.

Then, to show that the formula didn’t come from thin air, I showed a way to think of it:

There's an interesting way to see why this is true, which may help you remember it. (Though the easiest way to remember the formula is the old joke: "Why do they say 'pie are square' when pies are round?") Picture a circle as a slice of lemon with lots of sections (I'll only show 6 sections, but you should imagine as many as possible): * * * \ / * * \ / * * \ / * *--------+--------* * / \ * * / \ * * / \ * * * Now cut it along a radius and unroll it: /\ /\ /\ /\ /\ /\ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \/ \/ \/ \/ \/ \ ************************************************** All those sections (technically called sectors of the circle) are close enough to triangles (if you make enough of them) that we can use the triangle formula to figure out their area; all together they are A = 1/2 b * h = 1/2 C * r since the total base length is C, the circumference of the circle, and the height of all the triangles is r, the radius (if the triangles are thin enough). You should know that the circumference is pi times the diameter, or C = 2 * pi * r (this is actually the definition of pi), so the area is just A = 1/2 (2 * pi * r) * r = pi * r^2 In other words,the area of a circle is just the area of a triangle whose base is the circumference of the circle, and whose height is the radius of the circle.

This version of the formula can be very memorable. I find that many students have learned the standard formula almost as an incantation — though many forget whether it is the formula for area or circumference. Relating it visually to triangle areas may help.

In my explanation, I wanted to avoid deep algebra (because the student was young), so I glossed over a detail I might have clarified. Properly, I should have shown why we can just use the total base length as if it were one triangle. That amounts to the distributive property: \(A = \sum \frac{1}{2} b_n h = \frac{1}{2} \left(\sum b_n \right) h = \frac{1}{2} C r\).

I concluded,

What I've just done gets pretty close to algebra, which you haven't learned yet, but if you think about it (and maybe try actually measuring some real circles, or even make some lemonade) you should be able to see what I mean. You probably didn't know that the area of a circle is the same as the area of a triangle!

Doctor Jerry gave essentially the same derivation here:

Why is Area of a Circle Equal to Pi * (Radius Squared)?

My favorite derivation comes by way of a formula that applies to any regular polygon, and also answers the question, How can you find the area of a circle without using pi?

Why Pi? Dr. Math, I was just wondering... Why do we use pi when we calculate the circumference and area of a circle? I think one of my professors once told my class but I can't remember and am curious.

I first dealt with the circumference question, which is both simple and subtle:

Hi, Crystal. There are two questions here, with very different answers. First, for the circumference, it's because we DEFINE pi as C/D, so we can write C = pi D automatically. There's a trick hidden behind that definition, though: how do we know that C/D is the same for every circle? That takes a bit of proof, and leads to some interesting ideas; look in the Dr. Math FAQ on pi: Pi = 3.14159... http://mathforum.org/dr.math/faq/faq.pi.html or in the following answers in particular, for an explanation: Why is Pi a Constant? http://mathforum.org/library/drmath/view/57828.html Einstein, Curved Space, and Pi http://mathforum.org/library/drmath/view/55198.html Is Pi a Constant in Non-Euclidean Geometry? http://mathforum.org/library/drmath/view/55021.html

Because the ratio of circumference to diameter is a constant (as long as we are working in a plane), we can give it a name (pi) and then use that definition to find circumference. Given the diameter, we use \(C = \pi D\); or, given the radius, we use the fact that \(D = 2r\) and substitute, so that \(C = 2 \pi r\).

Area, though, is a very different matter.

For the area, there's a nice way to see why the formula should be what it is. Let's think about regular polygons first, and look at the relation between their areas and perimeters. Any n-sided polygon can be broken into n isosceles triangles like this: +-----+ / \ / \ / \ / \ +-----+-----+ + \ / \ / /|\ \ / \ / / |a\ +-----+ +--+--+ s Each of these triangles has a base that is equal to a side s of the polygon, and a height a (called the apothem); the total area is A = n * sa/2 This can be rearranged as A = (ns)a/2 and since ns is just the perimeter P of the polygon, this means A = Pa/2

That can be a very useful formula in itself; it looks a lot like the formula for a triangle. But we can apply this to a circle:

Now make n very large, and a will be very close to the radius r of the circle the polygon is becoming. We can see (and could prove more carefully if we took the time) that for a circle, A = Cr/2 where C is the circumference (perimeter) and r is the radius. But since we know C = 2 pi r this becomes A = 2 pi r * r/2 = pi r^2 We're done! Because we could find the area of a polygon using its perimeter, we can find the area of a circle using its circumference, and that uses pi.

This approach lends can be turned into a real proof more easily than the others, which is why I like it more. There is no hand-waving about a curved figure becoming a rectangle or triangle. Yet the basic idea is identical to the methods I showed first.

This formula \(A = \frac {Cr}{2}\) is what I referred to earlier, an area formula that doesn’t use pi. It showed up in the triangle derivation above, and is the answer to the question posed here:

Finding a Circle's Area Without Pi

For a longer version of the same approach, using trigonometry, see

Areas of N-Sided Regular Polygon and Circle]]>

Here is the initial question:

1/2 of a number is 2 more than 1/3 of the number. What is 1/3 of the number?

a. 2

b. 4

c. 8

d. 12

Doctor Rick took the question, and first, as we usually do, pointed out that in order to help, we need some information about the student’s knowledge and needs. When we are just given a problem, we can’t tell what kind of help is needed; the best thing is for the student to show his work so we can see where, if at all, they made a mistake. Sometimes they turn out to be entirely correct, and it was the book that was wrong — so it would be a waste of time to write out a full explanation. Other times, the student is not learning algebra at all, so again, an algebraic solution would be wasted.

But, as we often do, he gave a suggestion to get started, in case that was all that was needed:

One way this problem might be solved is to “translate” the statement into an algebraic equation. Is that something you’ve been learning to do? For instance, give “the number” the name “x”. You’re asked to find “1/3 of the number”, which you can write as (1/3)x or x/3. How would you write “1/2 of the number”? How would you write “2 more than 1/3 of the number”?

In this case (as often happens), the student is not fluent in English, which could be the reason for not giving a full explanation. We find that in such a case, it is better to say more rather than less — the more a student says about his thinking, the more likely we will understand it, even if he doesn’t express it clearly. Here is the response (a little cleaned up):

Not good in English sorry. But I’ll try.

To be honest, I don’t know where to start because I don’t know how to approach this problem. It says “more than” so I think it’s an addition so:

x/2(2) + x/3

…

Then… I’ll need to find 1/3 of that number? I can’t understand it.

Seeing this work gave Doctor Rick a good idea of what was needed. He replied,

Thanks for the additional information about how you’re thinking. That’s the sort of thing I needed.

The problem was:

1/2 of a number is 2 more than 1/3 of the number. What is 1/3 of the number?

You said:

It says that “more than” so I think it’s an addition so:

x/2(2) + x/3

OK, I see some good things here. You translated “1/2 of a number” as x/2, and “1/3 of the number” as x/3. You’re correct that “more than” in this sentence refers to addition (sometimes, as you’ll see later, it can mean other things.)

There are two main difficulties with what you wrote. One is that it isn’t an equation — you didn’t get an equal sign (=) into it. That’s how I’d translate the word “is”. The other problem is that you are not taking the grammar of the sentence into account.

Let’s work step by step, translating one part at a time rather than trying to write the equation all at once. Here’s what I mean:

1/2 of a number is 2 more than 1/3 of the number. \_____________/ \/ \_______/ \_______________/ x/2 = 2 + x/3Wow, I’m done — I’ve written an equation! It won’t always work this neatly — in this case, the sentence is in nearly mathematical language right from the start. The same thing wouldn’t work for other problems you have sent!

OK, now you have an equation to solve. Can you find the value of x that makes the equation true? And then can you figure out what “1/3 of the number” is?

Doctor Rick has done two things here: one is to **look at the whole sentence** and translate it into an equation, rather than a mere expression; and the other is to **not** look at the whole sentence at once, but to **break it into parts** (according to its grammar) in order to build up the equation part by part. In this case, it was easy to do both things at once; more complicated sentences often require working on each piece separately to avoid confusion, and changing the order of parts. For example, if it had said “2 **less than** …”, we couldn’t just write “2 – …”, because the order would be wrong. We’d have to say x/3 – 2 instead. So matching up phrase after phrase in order is not always appropriate, but it was convenient here.

This appeared to be the hard part for this student. We like to leave as much as possible for them to solve themselves, so that they get practice doing the work themselves; here it would have been hard to explain the ideas without doing the whole translation, but we can at least leave the solving to the student, who can probably manage that. And Doctor Rick was right about that:

OMG!!! Yah, the ‘is’ is =. Here’s my solution:

x/2 = 2 + x/3

x/2 – x/3 = 2

(3x – 2x)/6 = 2

x = 12

Since the question is what is the 1/3 of the number

x/3 = 12

3(x/3 = 12)

x = 4

Aaaaaaahhhh!!! And guess what! I got it right!!!!!

Well, there are some things to clean up here, but encouragement was appropriate, and Doctor Rick responded encouragingly,

You got it correct, and you are justified in feeling good about yourself! Keep it up!

But let’s go back and look at a few details in the solution.

First, the student used a good method, and show work quite well — not too much detail, but including the important steps. His approach was to subtract x/3 from both sides, then combine the left side into a single fraction, which reduces to x/6; then, multiplying both sides by 6, the result is x = 12.

Another approach, often taught in books and saving some work (or at least some writing, or some opportunities to make mistakes in using fractions, is to “clear fractions” before solving. The LCD of all the fractions in x/2 = 2 + x/3 is 6, so we multiply each term by 6 in order to cancel with each denominator:

x/2 = 2 + x/3

6*x/2 = 6*2 + 6*x/3

3x = 12 + 2x

Now, with no fractions to trip over, we can just subtract x from each side and solve:

3x – 2x = 12

x = 12

Second, the student was wise enough not to stop there, but to see that the question was not to give the number, but rather 1/3 of the number. So the answer is 1/3 of 12, which is 4.

But the work he *wrote* was wrong! It is not uncommon for a student to think correctly, but not yet be skilled in writing what he is thinking. Here, he changed x = 12 to x/3 = 12, replacing x with the x/3 that is desired; what he should have done is to divide both sides by 3, x/3 = 12/3 = 4. I can only guess that, as he approached the finish line, he started thinking and writing too fast, and stumbled. We’ll give him credit.

Let’s take one more step, and check the answer. Recall again that the problem was, “1/2 of a number is 2 more than 1/3 of the number. What is 1/3 of the number?” We found that the number is 12. What is 1/2 of the number? 6. What is 1/3 of the number? 4. Is 6 2 more than 4? Yes. If we had misinterpreted “2 more than” in translating the equation, we probably would interpret it correctly in this more familiar process, so we would have a chance to discover our error.

And, again, part of checking is seeing whether we answered the question that was asked. What is 1/3 of 12? 4. We got it.

Let’s take this in yet another direction. I mentioned at the start that this problem might have been given to a student who is not learning algebra, though that tends to be our first assumption. How might we solve it then?

The problem, once again, was, “1/2 of a number is 2 more than 1/3 of the number. What is 1/3 of the number?” The question might lead me to focus my attention not on the number itself, but on 1/3 of the number; I may never need to know what the number itself is! Here is a bar representing the number, with 1/3 of it shaded:

+-----------+-----------+-----------+ |XXXXXXXXXXX| | | +-----------+-----------+-----------+

Here I’ve split each third in halves, so I can mark 1/2 of the number:

+-----+-----+-----+-----+-----+-----+ |XXXXX|XXXXX| | | | | +-----+-----+-----+-----+-----+-----+ \_________________/

Since the 1/2 is 2 more than 1/3, that difference must be 2:

+-----+-----+-----+-----+-----+-----+ |XXXXX|XXXXX| 2 | | | | +-----+-----+-----+-----+-----+-----+ \_________________/

So each piece is 2, and the shaded region (two pieces) is 4. That’s the answer.

This sort of visual thinking is taught commonly today, and is a precursor to algebra. It requires more creativity than algebra, which is just a way to turn any problem into the same kind of symbolic problem, so that you don’t need any special thinking for each problem, but can follow a nearly mindless routine. The thinking I did here involved focusing deeply on the fractions (using a common denominator, for example, without actually saying so) and on relationships. That makes it a very useful experience. Don’t rush to learn (or teach) algebra — being easier (though students are surprised to realize that), it takes away the chance to learn to think deeply!

]]>