Pigeonhole Principle II: Sets, Subsets, and Sums

Last time, we looked at the Pigeonhole Principle, applying it to geometrical problems, largely about distances, gradually working from almost literal “balls and boxes” (“pigeons and pigeonholes”) to more abstract applications that are harder to see. Here, we will go beyond that, proving facts about sets.

Integers in a set

Moving further toward the abstract, this is from 2003:

Pigeonholes and Remainders

Demonstrate that in any collection of 52 distinct positive integers, there are two distinct numbers whose sum or whose difference is divisible by 100.

For example, if we take the numbers 

  1, 2, 3, ..., 52 

then one such pair is 

  52 + 48 = 100,  which is divisible by 100

If we take the numbers

  131, 132, 133, ..., 182

then one such pair is 

  132 + 168 = 300, which is divisible by 100 

How can I show that this must be true regardless of which numbers are chosen?

For completeness, we should include a subtraction example: In the set $$\{1,6,11,16,21,26,\dots226,231,236,241,246\},$$ we have the pair 26 and 226, for which \(226-26=200\) which is divisible by 100, among many others.

One approach to such a problem, which I touched on last time, is to imagine trying to make a set of numbers for which there is no pair of numbers whose sum or difference is a multiple of 100. This can at least give us a sense of what might go wrong. One thing I see is that, if we start with consecutive numbers \(1,2,3,\dots\), once we reach 50, we need to skip past everything from 51 to 99, which could be added to numbers we already have to get 100 (e.g. \(49+51=100\)). It would be okay to include 100 as the 51st number, but then 101 can’t be included, because \(101-1=100\) is a multiple of 100. (Notice that this shows that subtraction is a necessary part of the claim.)

So it looks like having more than 51 numbers causes trouble. Maybe there are some pigeonholes here? The tricky part will be the two separate requirements (addition and subtraction).

Doctor Rob answered:

Thanks for writing to Ask Dr. Math, Roshan!

Split the 52 numbers into sets S(n) according to their remainder n when divided by 100, that is, their last two digits.  

If any S(n) contained two or more numbers, then the difference of any two of them would be divisible by 100.  Thus we assume that no S(n) contains two numbers.

This is reminiscent of our last problem last time, where we split pairs of numbers into groups according to parity (even/odd, which likewise involves remainders).

So we have 100 boxes, namely the congruence classes of 1 through 100 (numbers ending in 01, 02, …, 99, 00). The subtraction requirement forces these to have no more than 1 each, and we’re assuming that that requirement is met. We’d like to use a new set of boxes to show that the addition requirement fails.

Now call the sets S(0), S(1), ..., S(99).  Create the following sets:

   T(n) = S(n) U S(100-n),  n = 1, 2, ..., 49,
   T(0) = S(0),
   T(50) = S(50).

There are 51 of these sets T(n).  The Pigeonhole Principle says that of the 52 integers, two of them must lie in one of these 51 sets, say T(k).  

Obviously k cannot be 0 or 50.  Since S(k) or S(100-k) have size at most 1, and their union T(k) has size 2, both must have size exactly one.  Then the numbers which lie in these two sets must add up to a multiple of 100.

Here we’ve combined pairs of boxes containing numbers that add up to 100, resulting in these 51 “boxes”:

$$T(0)=\{100,200,\dots\}\\T(1)=\{1,99,101,199,\dots\}\\\dots\\T(49)=\{49,51,149,151,\dots\}\\T(50)=\{50,150,\dots\}$$

The two integers in the same box must be in one of the T‘s that are the union of two S‘s, so they must be congruent to \(k\) and \(100-k\) respectively, making their sum divisible by 100.

Subsets with equal sums

Here is another subset question from 2003:

Sets and Subsets

Consider a collection of 26 stones weighing 1, 2, 3, … , 26 grams. Let's denote it by A(26). Prove that any subset C of A(26) consisting of at least 7 stones contains two separate subsets with equal total weights.

Equivalently, and more abstractly, we are considering the set of numbers \(A=\{1,2,3,\dots,26\}\). We want to show that any subset C of A with cardinality at least 7 contains two disjoint subsets with the same sum. (That’s my interpretation of the word “separate”: In this physical context, two “separate” sets of stones, existing at the same time, must be disjoint.)

But in fact we don’t need this restriction: If there are any distinct (not necessarily disjoint) subsets with the same sum, then if we remove their intersection, we will be left with two disjoint subsets with the same sum. So it doesn’t change the problem if we just try to prove this:

Any subset C of A, with cardinality at least 7, contains two distinct subsets with the same sum.

For example, suppose \(C=\{1,4,8,12,13,19,25\}\). Then the subsets \(\{1,4,13,25\}\) and \(\{4,8,12,19\}\) both add up to 43. But these both contain 4; removing that from both, we find that the disjoint subsets \(\{1,13,25\}\) and \(\{8,12,19\}\) both add up to 39.

Before looking at the answer, let’s give it some thought ourselves. My first idea is to count how many subsets C can have (assuming it has 7 elements), and how many possible sums there are. We hope there will be more subsets than sums, so the Pigeonhole Principle will finish the work. At least, it may give us a starting point.

First, how many subsets, of any size, are there? That would be \(2^7=128\); for each element, we can choose to include it or not. This includes the empty subset and all of C; since those can’t have the same sum as any other subsets, we might as well consider only non-empty proper subsets, of which there are 126.

Now, how many sums can such a subset have? That depends on the actual contents of C, since some subsets may have the same sum; but we know that the largest possible sum of any such C is \(\{20,21,22,23,24,25,26\}\), whose sum is 161. So, in principle, the sum of a (non-empty proper) subset of C might be anything from 1 through 160. That doesn’t help; we have 126 balls (subsets) to put in 160 boxes (sums), which we can do easily without doubling up.

Now what?

Doctor Jacques answered, giving the start of a solution, leaving a lot for us to think about. He starts with a quick simplification, similar to mine about disjoint vs. distinct:

It is enough to consider subsets of exactly 7 stones (if there are more, we just ignore the others).

Once we prove the claim for sets of 7, it will necessarily be true for any larger set, since a subset of the first 7 elements is a subset of the entire thing. I’d been accidentally assuming this, but it’s good to know that, too, doesn’t change the problem.

Now he looks at subsets of 4 elements:

If the subset C contains {23, 24, 25, 26}, we have:

 (23 + 26) = (24 + 25)

and we are done.

So we can suppose that C does not contain all of the largest four possible numbers. But how is this helpful?

Otherwise, the weight of any subset of C of at most 4 elements will be between 1 and:

  (23 + 24 + 25 + 26) - 1 = 97

i.e. that weight can take at most 97 values.

Why does he consider only subsets of up to 4 elements? I suspect that it is because he tried various possibilities and discovered that this will work! If we can prove that there will be two distinct subsets of this size with the same sum, that will be enough to complete our proof. And if this didn’t work, we could try a different size.

So there are no more than 97 possible sums. (It may actually be less, for particular sets C.) How many subsets are there?

Let's count the number of subsets of at most 4 elements. The number of subsets of k elements is:

  C(7,k) = 7! / (k! * (7-k)!)

If we compute these values for k = 1, .. 4, we get:

  k    C(7,k)
-------------
  1      7
  2     21
  3     35
  4     35

for a total of 98.

This means that we have 98 different non-empty subsets that can take at most 97 different values.

Can you continue from here? Please feel free to write back if you want further help.

So we can call subsets of 1, 2, 3, or 4 elements “balls” and sums “boxes”; there are 98 balls to fit into 97 boxes, so two such subsets must have the same sum. So the claim has been proved.

Let’s see if a different number would have worked. Since, between two disjoint subsets, the smaller one will have no more than 3 elements, we can count those sets:

This smaller subset can have a sum of at most \(24+25+26=75\), so there are 75 possible sums. And the number of possible smaller subsets is $$_7\mathrm{C}_0+_7\mathrm{C}_1+_7\mathrm{C}_2+_7\mathrm{C}_3=1+7+21+35=64,$$ so there are only 64 subsets, and the Pigeonhole Principle doesn’t apply. So it turns out that 4 elements is the magic number. Sometimes that’s how things work.

Subsets with equal sums, slowly

This brings us to a very similar (but considerably easier!) question from two years ago (August 2023), where we’ll be guiding a student to learn to write such a proof. Kalyan was working through the book Proofs: A Long-form Mathematics Textbook, by Jay Cummings, and was unsure of an example there:

Doctor, I want to know something regarding the pigeonhole principle and how that is being used in the instances of math problems such as the one given here:

Observe that this is stated abstractly, and makes it clear that X and Y need not be disjoint and the size of A is exactly 10 rather than at least 10, but otherwise is the same as the last problem with the numbers 26 and 7 replaced with 100 and 10.

This was in the midst of a long discussion with Doctor Rick, who chose to take our usual approach:

This is a nice problem, and thank you for pointing out the pigeonhole principle specifically as a direction to pursue. Here is a good mathematical statement of the principle:

If there is a mapping between two finite sets of unequal size, then at least one element in the smaller set must be paired with more than one element in the larger set.

I am not going to tell you what I see here; rather I will do what we at The Math Doctors prefer — I’ll guide you as you do the thinking. Please do not look elsewhere for a solution; even if it’s given in the book, don’t look at that. I promise we will get there, but as I have said, you need practice in discovering things for yourself.

In effect, we want to set up a mapping between the problem and the principle. The principle applies to a pair of finite sets of different size. What two sets of different size might be relevant to the problem? That’s all I will say for now.

The proof was indeed given, as this was an example, not an exercise. But in reading a textbook, it is often helpful to think through a problem before looking at the answer given, and that is the advice here. We want to learn from experience how to identify the “balls” and the “boxes”.

Notice that the version of the principle Doctor Rick found is stated abstractly. (It appears that Cummings’ own statement of the principle is more along the lines of “balls and boxes”, but we didn’t have that. The equivalence should be clear.)

Looking for two sets (balls and boxes)

Kalyan made a start, by thinking about the number of possible subsets, and the number of possible sums:

I tried my approach to reach a point where I can find the sum of the numbers in a subset X and Y can possibly take to. 

Next I do not how to proceed further because every combination will have one sum value. Of course, there are repetitions in lot of combinations , but I do not how to guess those , remove the duplicates and reach the right answer.

We see some appropriate ideas here, but not all the right things are being counted.

Doctor Rick replied, analyzing what Kalyan has done:

You did not directly answer my question:

The principle applies to a pair of finite sets of different size. What two sets of different size might be relevant to the problem?

Therefore, I must examine your work to see if I can find an answer hidden there.

I see that you found the number of possible 10-element subsets of A. You also found, more or less, the number of possible sums of 10 numbers from 1 through 100. Thus it appears that you have chosen these two sets:

T: the set of 10-element subsets of {1, 2, 3, …, 100}

S: the set of sums of 10 elements of {1, 2, 3, …, 100}

OK, how can the pigeonhole principle be applied to these sets? From your work, |T| > |S|, so the principle tells us that there must be at least one pair of elements of T that map to the same element of S. That is, there must be at least two sets of 10 elements of A such that the sum of the elements in each is the same.

Is this what the problem is asking about? I need to go back and look at the problem, because it’s easy to forget what we’re looking for! Here is what we need to prove:

Given any A ⊆ {1, 2, 3, …, 100} for which |A| = 10, there exist two different subsets X ⊆ A and Y ⊆ A for which the sum of the elements in X is equal to the sum of the elements in Y.

No, the problem is about two subsets of any given 10-element subset A, not about two 10-element subsets. You may not be far off, though; keep thinking.

So we need to count, not 10-element subsets (A), but subsets (X or Y) of a given 10-element set A.

Refining the proof

Kalyan responded:

For solving this problem using the pigeonhole principle:

I want to fit in the sums from 0 to 955 in 1024 boxes which is the boxes of the subsets from {} to {91,92,93………..100}; similarly, the same is carried out for other combinations. As our goal is to find the at least of the same sum of the elements in the two sets and respectively.

So, 1024 ≡ 69 mod ( 955), similarly the other combinations will also give at least a same sum between 0 and 955.

The number 1024 is the correct number of subsets of A, namely \(2^{10}\). The rest is unclear.

Doctor Rick responded first to the general strategy:

You might have the correct idea, but you have not convinced me that you do. Please state it clearly: What set of subsets are you talking about? Do not merely give an example, particularly one that is not clear: “the subsets from {} to {91,92,93………..100}” could refer to all subsets of {1, 2, 3, …, 100}, but I don’t think that’s what you mean.

If you want to learn to write proofs, you need to learn to write precisely.

Examples can be helpful to clarify, but it must be stated what they are examples of.

Then, to the last line:

I have no idea how modular congruence has any connection with the problem at hand. Can you explain this further?

Most likely this idea came from other problems in which congruence did play a role, such as our first example above; I commonly see students figuratively reaching for a tool that was left on the bench from a previous project, rather than look at the work at hand for inspiration as to what tool would be most useful!

Kalyan tried again:

The set of subsets for example: {1, 2, 3, 4, …, 10}

As I told earlier there will be several other combinations, but all the sums of different combinations of numbers (subsets of numbers, X, created from A) will be between 0 and 955.

We have to put all those sets whose sum lies between this sum range, respectively.

There are total 1024 subsets for one particular combination.

So, for a set like: A = {1, 2, 3, 4, …, 10}, the subsets could be X = {1, 2} and Y = {3}; both of the subsets will move to box 3. It is proved that at least one pair of sets have the same sum. Just like this, there will be many other sets too.

The basic ideas here are correct, but they are obscured by the examples.

Doctor Rick reconsidered what Kalyan was asking for:

I believe you do understand the proposition and why it is true. However, what you have written is not a proof.

Perhaps you have a proof already but you wanted to understand it. If so, I would like to see the proof you have, and then perhaps I can say more about it.

But if you do want to write your own proof, keep working. Don’t focus on examples, but rather statements that follow logically from the givens and the theorem (pigeonhole principle).

Learning to write clear proofs requires practice (and also experience reading proofs, to see how they are typically expressed).

Friendly and formal proofs

Kalyan rephrased his request:

Doctor, I have the proof in the book. It is an example problem not from the exercise.

Yet, I used this to see whether it matches the book’s method. It matched, but after that I wrote the proof part while replying to you just as given in the book. After your reply and noting the other examples in the book, I found that a proof should never have examples but it should rather have a logical flow. I don’t how to start the proof in a needed manner.

Here is the book’s proof, which he attached:

This is the direct sort of proof that we initially tried for the last question; this time it works. Note that, as Doctor Rick recommended, this proof isn’t based on examples (though one could have been included as a clarifying aside). But also, this proof is not written in a formal style, but in what I think of as a friendly style. Formality is not necessary, though it should be learned if one is to write proofs for mathematicians to read.

Doctor Rick responded with the start of a proof:

OK, I will show you how I would start the proof, in a somewhat more formal manner than the text. I will start by explicitly stating the two sets to which we want to apply the Pigeonhole Principle, as I did with the two sets that I inferred from your initial attempt.

To prove: Given any A ⊆ {1, 2, 3, …, 100} for which |A| = 10, there exist two different subsets X ⊆ A and Y ⊆ A for which the sum of the elements in X is equal to the sum of the elements in Y.

Consider the two sets:

T: the set of subsets of a given set A as defined above

S: the set of sums of no more than 10 elements of {1, 2, 3, …, 100}

The number of elements of T is … because …

The number of elements of S is [less than] … because …

Consider a mapping from T to S in which each subset of A is mapped to the sum of its elements. Since |S| < |T|, by the Pigeonhole Principle, there must be …

You can fill in the details.

Note that I suggested the words “less than” because we don’t really need to get an exact number for |S| as the text did. We only need to show that it is less than some number that is less than |T|. When I first saw the problem, I did not try to be precise; there was an easy overestimate that turned out to be sufficient.

Let’s finish it.

The number of elements of T (subsets of A) is 1024, because \(2^{10}=1024\).

The number of elements of S (sums) is less than 1000, because the smallest possible sum is 0 (for an empty set), and the largest possible sum is \(91+92+…+100<10(100)=1000\). [This is probably his “easy overestimate”.]

Consider a mapping from T to S in which each subset of A is mapped to the sum of its elements. Since |S| < |T|, by the Pigeonhole Principle, there must be at least two elements of S (subsets) corresponding to the same element of T (sum).

If the overestimate were greater than 1024, we would use the exact sum of 955 instead.

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.