Prime Factorization of a Number (Advanced)

Last time we looked at basic methods for finding the prime factorization of a number. Here we will look at some special techniques for large numbers, demonstrating them for not-too-large numbers. This takes us a step beyond previous tests that told us whether a number was composite, without actually factoring them.

Ad hoc thinking using a difference of squares

Here is a 1999 question about a special number:

Prime Factors of 4,194,305

I am in 8th grade and taking Algeom (algebra and geometry in one year).  Our teacher assigned the problem:

     Prime Factor 2^22+1

I have spent about 2 hours on the problem so far, and have talked to 2 of my classmates about it. I have been unable to make any progress other than finding out that one of the factors is 5. I worked out the number, which is 4194305, which is obviously divisible by 5.  I then found a program on the Internet that gives the prime factorization of a number. I entered my number and got 5*397*2113. We are supposed to show how we got our answer. I would like to know if you have a clever way to do this problem. I would appreciate it greatly.

It was easy to find the first factor, 5; but we need a powerful method to factor the rest, $4,194,305\div5=838,861$.

Doctor Schwa answered with a custom-made approach based on the known form of the number:

Interesting question, and I like your approach.

I tried to do it with a little algebra. Maybe you've learned by now that (x+1)^2 = x^2 + 2x + 1. Well, 2^22 + 1 looks sort of like that, except there's no 2x there. Rats. I thought for a while about how to put a 2x in there, and finally decided on this wishful thinking strategy:

2^22 = (2^11)^2, so if x = 2^11, then I have x^2 + 1. How do I get the 2x? I just stuck it in there. But wait, that's not fair, so I better subtract it, too, like this:

     2^22 + 1 = 2^22 + 2*2^11 + 1 - 2*2^11.

Now I have that perfect square factoring pattern, but with a -2*2^11 left outside:

              = (2^11 + 1)^2 - 2*2^11.

And 2*2^11 = 2^12, so

              = (2^11 + 1)^2 - 2^12

But 2^12 = (2^6)^2, so

              = (2^11 + 1)^2 - (2^6)^2

and now I use the 'Difference Of Two Squares' factoring pattern:

              = (2^11 + 1 + 2^6)(2^11 + 1 - 2^6)

which equals

              = (2113)(1985)

or (2113)(397)(5), the same answer you got.

Thanks for a fun problem. I wonder if my method is what your teacher had in mind?

Doctor Schwa used a very similar approach to factor a quartic polynomial here (near the bottom).

We’ll see another way using the difference of squares below.

Now, what if the number is really large, and doesn’t have a special form?

Two methods by Fermat

Here is a question from 2000, in which we’ll apply two methods to the same number, 1037:

Factoring Large Numbers

I am doing a report on Fermat and it says he developed a method for factoring large numbers. The theorem goes something like this:

"If p is a prime number, a is an integer, and p is not a divisor of a, then p is a divisor of a^(p-1) - 1."

However I don't completely understand how this is used. I can see how you can take a number, then find another number that has the original number as a factor, but I do not understand how it is used to take a number and find its factors. Can you help me out?

Doctor Rob answered:

Thanks for writing to Ask Dr. Math, Gabriel.

There are two separate ideas here. One is the factoring method, so far unspecified. The other is the theorem that you quote, which is known as Fermat's Little Theorem. They are related in only an indirect way.

Fermat's Little Theorem is used as a test of compositeness, as follows. You have a number p that you suspect may be prime, but don't know. You pick a number  a  which is relatively prime to p, that is, has no common factor with p. You compute the remainder when a^(p-1) - 1 is divided by p. If this is not 0, then the contrapositive of the theorem tells you that p is *not* prime after all, but is composite. If the remainder *is* 0, you aren't sure whether or not p is prime, but there is an excellent chance that it is.

Now if you are faced with a number to factor, you may want to apply the above test. If it is composite, you will go ahead and try to factor it. If it is most likely prime, you wouldn't waste any work trying to factor it without first trying to prove that it is prime, which is much more likely. There certainly is no point trying to factor prime numbers.

We saw this a couple weeks ago.

1: Search near the square root

Fermat devised two factoring algorithms that I know about. The one with which his name is usually associated goes like this.

Start with N, the composite number to be factored.  Let a be the biggest integer with a^2 < N. Write down the following table:

     k   a+k   a-k   GCD(a+k,N)   GCD(a-k,N)
     0
     1
     2
     3
     4
    etc.

and fill in the blanks in the table. If any of the GCD's are greater than 1, you have found a factor of N. 

For example, if one tries to factor 1037 by this method, you'll start with a = 32,

     k   32+k   32-k   GCD(32+k,1037)   GCD(32-k,1037)
     0    32     32           1                1
     1    33     31           1                1
     2    34     30          17

so 1037 = 17*61.

Here, he used $a=32$ because $32^2=1024<1037$, then looked for a nearby number that has a common factor with the given $N=1037$. When such a number is found, that common factor is what we’re looking for.

The idea here is to start looking for factors assuming a pair of similarly large factors, rather than a small prime. If a pair of factors are close together, then they will be close to the square root; so we start our search there. So this is a sort of “hopeful guess” strategy.

But how do you find the GCD without already knowing factors? You use the Euclidean algorithm for the GCD, which just uses division and remainders. In this case, to find $GCD(34,1037)$, we do this:

$$1037\div34=30\text{ R}17\\34\div17=2\text{ R }0$$ Since the GCD is 17, not 1, we have found a factor: 17.

2: Difference of squares

The other method is called the Difference of Squares Method. The general idea is to write N = a^2 - b^2 = (a+b)(a-b). If you can find a and b which makes this true (aside from b = (N-1)/2, a = (N+1)/2, which aren't helpful), then you have factored N. Fermat did this by rewriting the equation in the form a^2 - N = b^2. Then he started with the smallest positive a with a^2 >= N. If a^2 - N is a perfect square, then that a and b = sqrt(a^2-N) will work. If it is not a perfect square, then up a by 1 and try again. The work is made easier if, instead of squaring (a+1) and then subtracting N from that, you simply add 2a + 1 to the previous value, because (a+1)^2-N = (a^2-N)+2a+1. 

For example, if one tries to factor N = 1037 by this method, you'll start with a = 33, and the work will look like this:

      a   a^2 - N     2a + 1 
     33      52         67       52 + 67 = 119
     34     119         69      119 + 69 = 188
     35     188         71           etc.
     36     259         73
     37     332         75
     38     407         77
     39     484 = 22^2

so 1037 = 39^2 - 22^2 = 61*17.

You see that neither of these methods is related to Fermat's Little Theorem.

This method is sometimes just called Fermat’s Factorization Method; while it can be slow when there are small factors, it is useful after an attempt at trial division, which will catch that case.

Here, you need some way to take approximate square roots, whether a calculator or one of the available manual algorithms. Testing the root obtained to the nearest whole number will show that it is exact.

Four advanced methods

Another question, from 2003, will give us four methods, all demonstrated for 1189:

Factoring Very Large Numbers into Primes

Suppose I have a really big number to be factored into primes.  What is the best possible way of doing it?  I know I can divide by increasing primes, but that gets very hard for monstrous numbers.  Is it possible to find them in any other way?

Doctor Rob answered:

Thanks for writing to Ask Dr. Math, Chetan!

There are a number of ways to approach finding prime factors of large natural numbers.  I will discuss several, including some that are based in a field of mathematics called modular arithmetic.

The one you know, where you divide successively by 2, 3, 5, 7, 11, 13, 17, 19, 23, ..., the prime numbers in order, is called Trial Division.  It is very effective for numbers up to about 1000 when doing the divisions by hand, and to about 1,000,000 when using a computer.  The worst-case time to factor a composite N is sqrt(N) divisions.  The smallest prime factor is always found first.

For more than a million, you would need to have, and use, a list of primes through 1000.

1: Difference of squares, again

An old-fashioned way which is often effective is to express N as the difference of two squares, that is, to find x and y such that N = x^2 - y^2 = (x - y)*(x + y).  One way to do this is to start with x being the integer just larger than sqrt(N), and see if x^2 - N is or is not a perfect square.  If it is, you have found y^2.  If it is not, replace x by x + 1, and try again.  There are some shortcuts which you'll soon discover, since (for example) perfect squares must end in 00, n1, n4, 25, m6, or n9, where n is an even digit and m is an odd one.  Even better, one can eliminate whole congruence classes of possible values of x by reducing the equation modulo a small prime, and using the fact that both x^2 and y^2 must be quadratic residues modulo that prime.

For example, suppose N = 1189.  Then start with x = 35, x^2 - N = 1225 - 1189 = 36 = 6^2.  We've been lucky and got a success on the very first value of x!  Then N = (35 - 6)*(35 + 6) = 29*41.

After this, of course, you would see if you can factor either of the factors just found.

In the version we saw before, we used a shortcut by adding $2x+1$ to $x^2$, rather than doing the harder work of directly calculating $(x+1)^2$. Here, some new shortcuts are mentioned, though we didn’t have to use them: Only check if the newly found y is a square when the last two digits make that possible. In the earlier example, half of the numbers would have been excluded, saving the work of finding square roots.

2: Sums of squares

The following method is attributed to Euler:

Another interesting way is to express N as the SUM of two squares in two different ways:

   N = a^2 + b^2 = c^2 + d^2, 0 < a < c < d < b.

This can only work if N = 1 (mod 4).  Then

   N = ((bc + ad)(bc - ad))/((b + d)(b - d)).

The factors in the denominator will cancel with some in the numerator, but what is left will still be a useful factorization of N into two parts.

Not all numbers can be written this way, of course; but if it can be done, then this method can be faster than Fermat for numbers whose factors are not close together. We find that $$a^2+b^2=c^2+d^2\Rightarrow b^2-d^2=c^2-a^2$$ and $$\frac{(bc+ad)(bc-ad)}{(b+d)(b-d)}=\frac{b^2c^2-a^2d^2}{b^2-d^2}\\=\frac{b^2c^2-c^2d^2+c^2d^2-a^2d^2}{b^2-d^2}\\=\frac{b^2c^2-c^2d^2}{b^2-d^2}+\frac{c^2d^2-a^2d^2}{b^2-d^2}\\=\frac{(b^2-d^2)c^2}{b^2-d^2}+\frac{(c^2-a^2)d^2}{c^2-a^2}\\=c^2+d^2=N$$

This is not written as a single product, because factors need to be rearranged to obtain a pair of integers. For example:

Again, if N = 1189, one can find 1189 = 10^2 + 33^2 = 17^2 + 30^2, so

   N = (33*17 + 30*10)*(33*17 - 30*10)/(33 + 30)*(33 - 30),
     = (561 + 300)*(561 - 300)/(63*3),
     = 861*261/(3^3*7),
     = (861/21)*(261/9),
     = 41*29.

This can be generalized to writing

   N = a^2 + k*b^2 = c^2 + k*d^2, 0 < a < c, 0 < d < b.

There are always values of k which can be used, no matter whether N = 1 (mod 4) or N = 3 (mod 4).  Then N can be factored in the same way as above.

3: The Pollard Rho method

For slightly larger numbers, I would try the Pollard Rho Method.  It goes like this.  For factor N, start with x(0) as any convenient number except 0, 1, or -1, and for each n > 0, compute

   x(2n - 1) = x(2n - 2)^2 - 1 (mod N),
   x(2n) = x(2n - 1)^2 - 1 (mod N),
   d(n) = GCD[N, |x(2n) - x(n)|].

Continue this until you find a d(n) with 1 < d(n) < N.  Then d(n) is a factor of N, which is then split into two smaller factors d(n) and N/d(n).  The worst-case time to factor a composite N is N^(1/4) steps.  Small factors tend to be found before large ones, but not necessarily!

What we are doing here is simply making a sequence recursively defined by $$x_n=x_{n-1}^2-1\text{ (mod }N)$$ and comparing a pair of terms, one moving twice as fast as the other, $|x_{2n}-x_n|$. As in the first method above, we would use the efficient algorithm for finding the GCD.

Again, if N = 1189, this proceeds as follows.  Let x(0) = 4.

   n   x(2n-1)   x(2n)   |x(2n) - x(n)|   d(n)
   0                4
   1      15      224         209           1
   2     237      285          61           1
   3     372      459         222           1
   4     227      401         116          29

Thus N = 29*41.

I initially misinterpreted the method, in part due to a typo I have corrected here, and because the fourth column is not what you might expect. I have added color to show what pairs of numbers are subtracted at each step. We can see this more clearly if we write the sequence $x_n$ in a single row:

Although there is much more that could be said about why the method works, and why it is efficient, you might naïvely see it just as a way to randomly choose numbers with which to find the GCD; whenever this turns out to be greater than 1, we have a divisor! (Occasionally it will fail, and you need to pick a different starting number. There are several ways to vary the method.)

This method is discussed a little differently in

Factoring Large Numbers

4: The Pollard p-1 method

Another possibility is the Pollard P-1 method.  To factor N, start with x(1) = a, for some convenient integer a > 1, and for each n > 1, compute

   x(n) = x(n - 1)^n (mod N),
   d(n) = GCD[N, x(n) - 1].

Continue this until you find a d(n) with 1 < d(n) < N.  Then d(n) is a factor of N, which is then split into two smaller factors d(n) and N/d(n).  The worst-case time to factor a composite N is about sqrt(N) steps, but the average time for a random N is much smaller.  Small factors tend to be found before large ones, but not necessarily!

Again, let N = 1189.  Pick a = 2.  Then

   n  x(n)   d(n)
   1     2      1
   2     4      1
   3    64      1
   4   426      1
   5   575     41

So N = 41*29.

Here we are raising $x_n$ at each step to a higher power to get the next (all modulo N). In particular, $x_n\equiv a^{1\cdot2\cdot3\cdots n}$.

We are searching for a prime factor p, which turns out in this example to be $p=41$. We found it at the fifth step because $p-1=40=2^3\cdot5$, whose largest prime factor is 5; we see that $$x_5=2^{1\cdot2\cdot3\cdot4\cdot5}=2^{120},$$ so this is the first time the exponent had 5 as a factor, and thus could be a multiple of $p-1$. (This is why we raise to increasing powers. A fuller form of the method multiplies by successive primes.)

By Fermat’s Little Theorem, $2^{p-1}\equiv1\text{ (mod }p)$, so that $\left(2^{p-1}\right)^{k}\equiv1\text{ (mod }p)$ for any positive integer k; in particular, $$2^{120}=\left(2^{40}\right)^{3}=\left(2^{p-1}\right)^{3}\equiv1\text{ (mod }41),$$ and therefore $$2^{120}-1\equiv0\text{ (mod }41),$$ so that $x_5$ is a multiple of the prime 41. And since 41 is also a factor of our modulus, $N=1189$, we could successfully discover this as a GCD.

Note that this method will not always work at all; it assumes that $p-1$ has a reasonably small factor. But when it does work, it can be very effective.

… and beyond

For really big numbers (scores of digits), the methods of choice are two very complicated methods called the Elliptic Curve Method and the Number Field Sieve.  Explaining them is more than I can do in this message.

There are many other methods, too, which are effective in some circumstances.  Some of their names are Fermat's Method, the Continued Fraction Method, and SQUFOF, among many others.

In any case, once you have split N into two factors, you should test the factors to see if either or both is composite.  If so, one should continue by trying to factor these smaller composite numbers using an appropriate method such as those above.  Compositeness testing is another whole problem!

For some of these methods, as well as those we’ve touched on, see MathWorld here.

Using each method for a larger number

Just for fun, having made a spreadsheet that carries out each of these methods (with manual steps in some cases), I put in the number 838,861 that we had to factor for the first question.

The Fermat search starting at the square root didn’t find anything until $k=121$, at which point I tested $a-k=794$ and found the GCD 397. It took so long because each prime factor, 397 and 2113, is far from the square root, 915. If they had been closer, we would have found one sooner.

The Fermat difference of squares method likewise took a long time; I had to increase a from 916 all the way to 1255, so that $$a^2-N=1255^2-838,861=736,164,$$ and $b=\sqrt{736,164}=858$. Then $$a-b=1255-858=397\\a+b=1255+858=2113.$$

The sums of squares method required decreasing A from 915 to 819, so that $B=410$ and the fraction became $$\frac{686,725\times25,805}{1625\times13};$$ moving factors around, this became $$\frac{686,725}{325}\times\frac{25,805}{65}=2113\times397.$$

The Pollard rho method worked quickly. Starting with $x_0=2$, after 8 terms I got $$x_8-x_4=472,825-3968=468,857,$$ and the GCD was 397.

For the Pollard $p-1$ method, I had to try values of a from 2 through 34, but that one worked quickly; $$x_3=461,315,$$ and the GCD was 397.

I would not want to have done this entirely by hand! But then, I also would not want to have tried dividing by all the primes from 2 to 915 (though I did list them all a couple weeks ago).