In number theory, Dixon's factorization method (also Dixon's random squares method[1] or Dixon's algorithm) is a general-purpose integer factorization algorithm; it is the prototypical factor base method. Unlike for other factor base methods, its run-time bound comes with a rigorous proof that does not rely on conjectures about the smoothness properties of the values taken by a polynomial.
The algorithm was designed by John D. Dixon, a mathematician at Carleton University, and was published in 1981.[2]
Dixon's method is based on finding a congruence of squares modulo the integer N which is intended to factor. Fermat's factorization method finds such a congruence by selecting random or pseudo-random x values and hoping that the integer x2 mod N is a perfect square (in the integers):
For example, if N = 84923, (by starting at 292, the first number greater than √N and counting up) the 5052 mod 84923 is 256, the square of 16. So (505 − 16)(505 + 16) = 0 mod 84923. Computing the greatest common divisor of 505 − 16 and N using Euclid's algorithm gives 163, which is a factor of N.
In practice, selecting random x values will take an impractically long time to find a congruence of squares, since there are only √N squares less than N.
Dixon's method replaces the condition "is the square of an integer" with the much weaker one "has only small prime factors"; for example, there are 292 squares smaller than 84923; 662 numbers smaller than 84923 whose prime factors are only 2,3,5 or 7; and 4767 whose prime factors are all less than 30. (Such numbers are called B-smooth with respect to some bound B.)
If there are many numbers a 1 … a n {\displaystyle a_{1}\ldots a_{n}} whose squares can be factorized as a i 2 mod N = ∏ j = 1 m b j e i j {\displaystyle a_{i}^{2}\mod N=\prod _{j=1}^{m}b_{j}^{e_{ij}}} for a fixed set b 1 … b m {\displaystyle b_{1}\ldots b_{m}} of small primes, linear algebra modulo 2 on the matrix e i j {\displaystyle e_{ij}} will give a subset of the a i {\displaystyle a_{i}} whose squares combine to a product of small primes to an even power — that is, a subset of the a i {\displaystyle a_{i}} whose squares multiply to the square of a (hopefully different) number mod N.
Suppose the composite number N is being factored. Bound B is chosen, and the factor base is identified (which is called P), the set of all primes less than or equal to B. Next, positive integers z are sought such that z2 mod N is B-smooth. Therefore we can write, for suitable exponents ai,
When enough of these relations have been generated (it is generally sufficient that the number of relations be a few more than the size of P), the methods of linear algebra, such as Gaussian elimination, can be used to multiply together these various relations in such a way that the exponents of the primes on the right-hand side are all even:
This yields a congruence of squares of the form a2 ≡ b2 (mod N), which can be turned into a factorization of N, N = gcd(a + b, N) × (N/gcd(a + b, N)). This factorization might turn out to be trivial (i.e. N = N × 1), which can only happen if a ≡ ±b (mod N), in which case another try must be made with a different combination of relations; but if a nontrivial pair of factors of N is reached, the algorithm terminates.
This section is taken directly from Dixon(1981).
Dixon's Algorithm
Initialization. Let L be a list of integers in the range [1, n], and let P = {p₁, ..., pₕ} be the list of the h primes ≤ v. Let B and Z be initially empty lists (Z will be indexed by B).
Step 1. If L is empty, exit (algorithm unsuccessful). Otherwise, take the first term z from L, remove it from L, and proceed to Step 2.
Step 2. Compute w as the least positive remainder of z² mod n. Factor w as:
w = w ′ ∏ i p i a i {\displaystyle w=w'\prod _{i}p_{i}^{a_{i}}}
where w′ has no factor in P. If w′ = 1, proceed to Step 3; otherwise, return to Step 1.
Step 3. Let a ← (a₁, ..., aₕ). Add a to B and z to Z. If B has at most h elements, return to Step 1; otherwise, proceed to Step 4.
Step 4. Find the first vector c in B that is linearly dependent (mod 2) on earlier vectors in B. Remove c from B and z c {\displaystyle z_{c}} from Z. Compute coefficients f b {\displaystyle f_{b}} such that:
c ≡ ∑ b ∈ B f b b ( mod 2 ) {\displaystyle \mathbf {c} \equiv \sum _{b\in B}f_{b}\mathbf {b} {\pmod {2}}}
Define:
d = ( d 1 , … , d n ) ← 1 2 ( c + ∑ f b b ) {\displaystyle \mathbf {d} =(d_{1},\dots ,d_{n})\gets {\frac {1}{2}}\left(\mathbf {c} +\sum f_{b}\mathbf {b} \right)}
Proceed to Step 5.
Step 5. Compute:
x ← z c ∏ b z b f b , y ← ∏ i p i d i {\displaystyle x\gets z_{c}\prod _{b}z_{b}^{f_{b}},\quad y\gets \prod _{i}p_{i}^{d_{i}}}
so that:
x 2 ≡ ∏ i p i 2 d i = y 2 mod n . {\displaystyle x^{2}\equiv \prod _{i}p_{i}^{2d_{i}}=y^{2}\mod n.}
If x ≡ y {\displaystyle x\equiv y} or x ≡ − y ( mod n ) {\displaystyle x\equiv -y{\pmod {n}}} , return to Step 1. Otherwise, return:
gcd ( n , x + y ) {\displaystyle \gcd(n,x+y)}
which provides a nontrivial factor of n, and terminate successfully.
This example is lifted directly from the LeetArxiv substack.[3] Credit is given to the original author.
int n = 84923; for(int i = 1; i <= n; i++) { int z = i; }
Quick check shows 84923 = 521 × 163 {\displaystyle 84923=521\times 163} .
The quadratic sieve is an optimization of Dixon's method. It selects values of x close to the square root of N such that x2 modulo N is small, thereby largely increasing the chance of obtaining a smooth number.
Other ways to optimize Dixon's method include using a better algorithm to solve the matrix equation, taking advantage of the sparsity of the matrix: a number z cannot have more than log 2 z {\displaystyle \log _{2}z} factors, so each row of the matrix is almost all zeros. In practice, the block Lanczos algorithm is often used. Also, the size of the factor base must be chosen carefully: if it is too small, it will be difficult to find numbers that factorize completely over it, and if it is too large, more relations will have to be collected.
A more sophisticated analysis, using the approximation that a number has all its prime factors less than N 1 / a {\displaystyle N^{1/a}} with probability about a − a {\displaystyle a^{-a}} (an approximation to the Dickman–de Bruijn function), indicates that choosing too small a factor base is much worse than too large, and that the ideal factor base size is some power of exp ( log N log log N ) {\displaystyle \exp \left({\sqrt {\log N\log \log N}}\right)} .
The optimal complexity of Dixon's method is
in big-O notation, or
in L-notation.