Permutations and Combinations: Undercounts and Overcounts

(A new question of the week)

We have been looking at some combinatorics questions, both easy and challenging. Some questions have come to us in recent weeks that can illustrate how to think your way through relatively difficult problems, including catching errors and interpreting a textbook’s solutions. We’ll see yet again that there are usually multiple ways to solve these, and the book’s way is not always the best.

Undercounting Mississippi permutations

Here is the first question:

I am trying to solve this question:

Find the number of words made using all the letters of the word “MISSISSIPPI” such that both P’s are together but all I’s are separate.

I solved this question like this:

Firstly I have permuted M, S, S, S, S as (5!/4!) [I have applied the concept of permutation of alike objects taken all at a time]

So, as there are 5 objects in total, hence, the total gaps created is 6.

Now consider [PP] as a single unit.

The objects yet to be arranged are I, I, I, I, [PP]

So, select 5 gaps out of six to place the 5 objects= 6C5

And finally arrange the five objects like this = (5!/4!) [Divide by 4! because there are 4 similar objects – 4 Is]

Hence the final answer is (5!/4!) * 6C5 * (5!/4!)

But the correct answer given in my textbook is (6!/4!)*7C4*1

I don’t understand where am I doing wrong?

MISSISSIPPI is a common example in permutations of multisets (“words” with repeated letters, or “permutations of alike objects”), just asking how many permutations there are. There is a formula for this, which I will be discussing in a later post; but for the moment we won’t really need it.

Let’s see what Navneet has done. He first counted the ways to permute the letters that are not constrained in the problem, M S S S S. He used the formula he was taught; but we can just observe that we can put the M in any of 5 places, so there are 5 ways to arrange them.

Next, he inserts both P’s (as a single unit) and the 4 I’s; there are 6 places each of these 5 items can go: _ M _ S _ S _ S _ S _ (however the M’s and S’s have been placed). So he chooses 5 blanks (6 ways), and arranges the 5 items in them (5 ways), for a total of \(5\cdot 6 \cdot 5 = 150\) ways.

But the book gives a different expression, with a different value, as its answer. What went wrong? I can see an error; can you? These things can be subtle. We also have the question, what is it that they did?

Doctor Rick answered, first commenting on the book’s answer:

I don’t really understand why a textbook would give an answer like this — evidently the individual factors are intended to indicate the thinking behind them (otherwise why include a factor of 1?), but they do not do the job. It takes an explanation in words, such as you have given; thanks for doing that!

I notice that your answer is 1/7 of the book’s answer; you are somehow significantly undercounting. Let’s look at your work:

Firstly I have permuted M, S, S, S, S as (5!/4!) [I have applied the concept of permutation of alike objects taken all at a time]

So, as there are 5 objects in total. Hence, the total gaps created is 6.

Now consider [PP] as a single unit.

The objects yet to be arranged are- I, I, I, I, [PP]

The last line above is where I see something wrong. Your method will find the number of permutations with the P’s together but not adjacent to another I or to a P. For instance, you don’t count MISSISSIPPI, though it fits the conditions. It’s OK for two I’s to be separated by PP as well as by M or S. Maybe that is enough for me to say; try correcting your work with this hint.

So, select 5 gaps out of six to place the 5 objects= 6C5

And finally arrange the five objects like this= (5!/4!) [Divide by 4! because there are 4 similar objects – 4 I’s]

By the way, I wouldn’t bother with 5!/4!, either here or earlier when you permuted M, S, S, S, S. I’d just think: There are 5 possible positions for the M to go, and the other four positions are occupied by S’s. What you do isn’t wrong, but it’s overkill. Likewise, to select 5 gaps out of 6, I’d just think: Pick the one gap I won’t be using. There are 6 choices.

So Navneet’s method is nice, but missed some possibilities, which we call an undercount. He accidentally excluded too much.

Navneet responded,

Thank you Doctor Rick!

I have understood where I was doing wrong.

Of course, that leaves us hanging! We see what was wrong; but we still need to find a way to get the right answer. What process can we follow to make all the legal choices, and no illegal ones?

Maybe we just need to slow down. The P’s and the I’s have different restrictions, so we should deal with them separately. We have 5 ways to arrange MSSSS; now let’s put in [PP] (keeping it as a unit): it can go in any of the 6 slots in _ M _ S _ S _ S _ S _, so we now have a total of 30 ways to arrange MSSSS[PP]. Now we can insert the 4 I’s; since they have to be separate, each must go in one of the 7 slots in _ M _ S _ S _ S _ S _ [PP] _. So we choose 4 of 7 slots, and then we are finished. The total number of possibilities is \(5\cdot6\cdot _{7}\!C_{4}=30\cdot35=1050\).

I suspect that the book’s author used the multiset permutation formula to arrange MSSSS[PP], followed by our insertion of the I’s.

Overcounting cricketer combinations

Following this, Navneet had a new problem:

I am stuck on another problem:

A team of 11 is to be selected out of 10 batsmen, 5 bowlers, and 2 keepers such that in the team at least 4 bowlers should be included. Find the number of possible ways of selection.

I tried to solve this question like this:

First select 4 bowlers out of 5 = 5C1

Then, remaining candidates = 10+2+(5-4) = 13

Hence, select the remaining 7 players out of 13 = 13C7

So, my final answer is 5C4*13C7

But, this is a wrong answer.

The correct answer given is (5C4*12C7)+(5C5*12C6)

Please explain me where I am doing the error?

Also, can you please tell me what should I check or do in order to avoid such errors in future?

Again, a well-asked question, showing his thinking along with the problem and the provided answer, so we have all we need. He got 8580, while they say 4884. Why?

Doctor Rick responded:

I thought first of the same approach you took. Then I considered it more carefully, looking to see if I had missed any possibilities or if I had counted any selection more than once.

I then realized that I was overcounting, and here’s why: You’re selecting four bowlers to include first, and then maybe the fifth bowler will be among the remaining 7 players you choose. But if you chose a different set of four bowlers to start, and then the fifth bowler, you’d end up with the same set of 11 players — you just picked all five bowlers in a different order.

More specifically:

Say the five bowlers are A, B, C, D, and E. When you select four bowlers, you might get A, B, D, and E. Then when you select 7 more players, you might get C — or you might not. If you do, then you get the same set of players as if you had first selected B, C, D, and E, then got A as one of the 7 more players. So you have counted this same set of players more than once. (In fact you counted it five times, one for each bowler you could have left out of the initial selection of bowlers (and then gotten among the other seven).

This is the essence of overcounting: selecting in such a way that the same selection can be made in more than one way, so that it is counted more than once. There are several ways to fix this: you can do this, but then correct the overcount as an additional step (this is what the inclusion-exclusion principle does, subtracting the excess that was added); or you can break things down into cases, adding together counts that do not overlap.

Since we know the book’s answer, we can also try to learn from that:

Have you considered how they got their answer? As I said, just showing an expression like this does not constitute an explanation, but it does supply some hints. In particular, we can see that the writer used two disjoint cases; I think you can figure out what those cases are.

That is a significant hint. Since Navneet just moved on without giving his answer, I will state it here: The two cases are “exactly 4 bowlers” and “exactly 5 bowlers”. In the first case, we have \(_{5}C_{4}\) ways to choose the 4 bowlers, and \(_{12}C_{7}\) ways to choose the remaining 7 from the 12 non-bowlers; in the second case, we have \(_{5}C_{5} = 1\) ways to choose the 5 bowlers, and \(_{12}C_{6}\) ways to choose the remaining 6 from the 12 non-bowlers. The total, as given in the book, is \(_{5}C_{4} \cdot\, _{12}C_{7} +\, _{5}C_{5} \cdot\, _{12}C_{6} = 5 \cdot 792 + 1 \cdot 924 = 4884\).

As for the general question of avoiding errors:

All I can say in general is what I said above: Go over your selection method, looking for selections you either didn’t count or counted more than once. If you did either of these, you might be able to modify your approach by using the inclusion-exclusion principle, but you may decide (as the solution’s author did) that it’s easier to take a somewhat different approach.

Miscounting words

Continuing to pursue a better understanding of the whole process, Navneet compared his method of solving this problem with another problem, this time a permutation:

Now, see another question:

Find the number of 4 letter words that can be formed using the letters A, L, H, B, D such that the word ends with D but does not begin with H.

The solution given in my textbook of this question is, (3C1) * (3C2)* (1) * 2!

If you would see clearly then the method of selection used is same in both the solutions.

But in the first question overcounting occurred whereas in the second question no overcounting occurs (but opposite thing occurs, the counting is less) and we have to adjust our answer by multiplying the result with 2!?

Why such strange thing occurs?

Navneet sees this method of solution as paralleling his previous method; how can one be right and the other wrong?

Doctor Rick answered:

As I have said before, I do not “see clearly the method of selection used”, based only on the solution given. I have to think about it.

I take it that the 3C1 refers to selecting one of the letters A, L, B for the first letter of the word (it can’t be H, and it can’t be D because that is taken). Personally, I’d just put 3 ways of choosing the first letter. The (1) surely represents the number of choices for the last letter (it must be D).

Now, the two middle letters remain to be selected from a pool of three remaining letters (since we’ve used two of the five letters already). The order of these two letters does matter, so we use permutations: 3P2 = 6. According to the method you have evidently been taught, this is 3C2 * 2!.

The “undercounting” to which you refer is because, if we don’t take order of the middle two letters into account, the words BLAD and BALD (for example) would be counted as the same word. We don’t want to do that, so we must multiply by 2! (the number of different orderings of any two different letters).

Does this help?

So the “undercounting” he sees here is an entirely different kind than we had before, relating only to the difference between combinations and permutations: combinations ignore order, and we don’t want to. It appears that Navneet has been taught these concepts in a different way than we are accustomed to, perhaps not even using a notation for permutations at all but building everything on the concept of combinations. I think of permutations as more fundamental and natural, so that we start with \(_{n}P_{n} = n!\), generalize to \(\displaystyle _{n}P_{r} = \frac{n!}{(n-r)!}\), and then (because each combination corresponds to all possible permutations of the result) \(\displaystyle _{n}C_{r} = \frac{_{n}P_{r}}{_{r}P_{r}}\).

Navneet was not quite sure yet:

Thanks Doctor Rick!

Almost everything was clear to me.

I now understand that there is no clear cut formula or method to check overcounting, and every time we need to use our logic/thinking to prevent overcounting.

I have a last question:

While solving this question – “Find the number of 4 letter words that can be formed using the letters A,L,H,B,D such that the letter ends with D but does not begin with H.”

Why do we multiply with 2! and not 3! ?

I am asking this because in the question the position of only one letter “D” is fixed.

So, there are remaining 3 places. Hence, we need to multiply with 3! in order to get the total number or arrangement.

And if we multiply by 2! then we permute between only two of the selected letters, not three. Hence, our answer must be wrong (because of less counting). Isn’t it?

Doctor Rick first answered the general comment:

That’s true: There is no substitute for thinking! Formulas and rote procedures are useful when you’re doing the exact same type of problem over and over; but that’s not usually the case with counting problems. Each has something different about it, and it isn’t easy to choose a valid approach.

Likely it is an over-reliance on a taught procedure that led to the error. Doctor Rick continued:

Recall the procedure by which I found the answer last time. First we choose the last letter, because it must be D. Then we choose the first letter, because it has a special condition: it can’t be H (or D, since that must be used for the last letter). This leaves the middle two letters. Trying to follow the procedure you seem to be learning, I pick a pair of letters out of the three remaining (in 3C2 ways), and then multiply by 2! to get the number of orderings of those two letters.

Doing it my own way, I put D in the fourth position (1 choice), then choose one of the letters A, L, B to go in the first position (3 choices). Then I choose one of the remaining three letters (H and those I didn’t use from A, L, B) — 3 choices again. Finally I choose one of the two letters left, for the third position.

3 × 3 × 2 × 1 = 18 ways (the factors are for positions 1, 2, 3, 4 in order)

If it isn’t clear yet, note that we need to treat the first position differently from the middle two, because it has an added condition: it can’t be H. If I were to multiply by 3! instead of 2!, it would correspond to permuting the first three letters; but then I might be moving an H into the first position where it can’t be.

This is the fill-in-the-blanks approach, where we don’t try to follow a formulaic method, but just break the problem down into a series of choices (permutation-first, we might say — taking order into account as long as possible, rather than this teacher’s combination-first approach, multiplying by a factorial at the end).

Navneet remained unsure:

It is correct to take care that we don’t move the letter H into the first position by multiplying with 3!.

But, don’t you think we have missed some cases by just only multiplying with 2! ?

Because when we multiply with 3!, then, there are some extra cases when we are not moving the letter H to the first position.

Like,

ALHD——-to———>LHAD

So, don’t you think we are going to miss many such cases if we just multiply with 2!? (or don’t permute the first position)

I will be thankful for help!

Doctor Rick answered by actually listing the possibilities to leave no doubt:

Here was the problem — modified by stating the assumption that seems to be made in the textbook’s solution:

Find the number of 4 letter words that can be formed using the letters A, L, H, B, D no more than once each, such that the letter ends with D but does not begin with H.

Since the claimed answer is so small (18), we can just list all the 4-letter words that fit the condition:

ABHD  ABLD  AHBD  AHLD  ALBD  ALHD

BAHD  BALD  BHAD  BHLD  BLAD  BLHD

LABD  LAHD  LBAD  LBHD  LHAD  LHBD

Can you show me a word that I have missed?

We multiply by 2! after we have already dealt with all the possibilities for the first letter, A, B, or L (the factor 3C1 in the textbook’s solution) and we have selected two of the remaining 3 letters for the middle two positions (the factor 3C2).

You’ll see that my list above consists of six words beginning with A, six beginning with B, and six beginning with L. Let’s focus on the six words beginning with L. There are 3C2 = 3 choices for the pair of letters in the middle: {A, B}; {A, H}; {B, H}. Each pair can be in either order: AB or BA; AH or HA; BH or HB. That’s the factor of 2!.

You note that some permutations of the first 3 letters are valid — those that don’t put H in the first position. If H is not one of the middle two letters, then we could permute freely among all the first 3 letters, making 6 words. But there are two problems with this.

First, in order to use the multiplication principle, the number of choices must not depend on other choices — but I just said that multiplying by 3! only works when H is not in the pair chosen for the middle two letters.

Second, we will be overcounting: for instance, we can choose L for the first letter and {A, B} for the middle two, or A for the first letter and {L, B} for the middle two, and get LABD by permuting the first three letters in both cases. Thus LABD would be counted twice.

3 thoughts on “Permutations and Combinations: Undercounts and Overcounts”

    1. Which of the three problems are you referring to?

      The first, using all the letters of MISSISSIPPI, requires repetition.

      The second, selecting a cricket team, clearly does not allow repetition, as a person can’t be on the same team more than once.

      The third, making four-letter words, as stated left it unclear whether duplicate letters were allowed (so this is clearly what you were referring to); but Doctor Rick at the end restated it, making his assumption explicit: “Find the number of 4 letter words that can be formed using the letters A, L, H, B, D no more than once each, such that the letter ends with D but does not begin with H.” He made this assumption because it made the book’s answer correct!

      So, yes, as you indicate, removing that condition would change the answer, and the book definitely should have stated it. My suspicion is that this problem was given in a context where all problems were permutations of some sort, so that it was assumed. (In face-to-face interactions with students, I often look at the context in their book to check for such contextual assumptions, or instructions at the start of a set of exercises, in order to make sure I am interpreting a problem correctly.)

      Thanks for the opportunity to emphasize this important point.

  1. Pingback: Probability: Cards vs Dice – The Math Doctors

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.