This is the tenth problem of Project Euler1, a website launched in 2001 with over 600 math/programming challenges.
This is meant to be solved using a computer (it’s impossible otherwise), and while I will be using python, no prior knowledge of programming or computer science is required to appreciate the problem and its solution.
At a very basic level, a computer program is a list of instructions for the computer to follow. You might say “print the value of 2+2,” and the computer will do exactly that:
So let’s try to make a list of instructions to find the sum of all primes under 2 million:
- Start with one
- If this number is prime, add it to a running total.
- If this number is equal to 2,000,000, go to 4. Otherwise, add one2 to the number and go to 2.
- Print the running total.
The problem here is that the computer has no way of knowing if a number is prime or not. We have to tell it (with a list of instructions) how to differentiate between prime and composite numbers. But before I tell a computer how to do that, let's think about how you might do it.
If I gave you a number, 207, how would you go about (dis)proving it's primality? I think most people would have an inner dialogue that goes something like this:
"Well it's not divisible by 2, but it is divisible by 3, so it's composite."
You just keep on going through the numbers until you find a number that divides into it or you reach the square root of the number.3
This works! it's actually fairly easy to program, and it correctly identifies numbers as either prime or composite:
So let's try it out in our previous code:
You'll notice that it isn't showing an answer. That’s because it's still working on it. Ahh it's fine. I'll just leave it for an hour.
Still not finished.
The problem with this approach is that it doesn't scale well. In testing the primality of 13, the computer only has to check 2 numbers: 2, and 3. This makes for quick calculations. However, while confirming that 996563 is prime, the computer would have to check nearly a thousand numbers.
While my computer certainly can make 1000 calculations in a blink of an eye, you start to run into problems when you ask it to do that 2 million times.
We need a different approach. Whatever method we choose, it has to scale well, meaning it shouldn't take too much longer to calculate the first 100 primes than it takes to calculate the first 2000.
The4 solution is pretty clever. Instead of testing if a number is prime, one by one, start with all the numbers, then remove all composite numbers.
This process is called the “sieve of Eratosthenes,” after an ancient greek mathematician. Here’s how it goes:
Assume all numbers are prime.
Start with 2. double it to get 4 and mark it as composite. Then add 2 to that and mark 6 as composite. Continue this process until you've marked all even numbers as composite.
Repeat step 2 with the next prime number.5 (eventually, you stop when the numbers become suitably big)
The wikipedia gif6 is so fantastic, I have to show it:
What’s wonderful about this is that it only takes a couple seconds to label every number from 2 to 2 million as either prime or composite.
Here's the implementation:
And that’s how you use a process devised over 2000 years ago to solve a programming problem. Math is awesome.
1named after the Swiss mathematician.
2or start at 3, with increments of 2. You just have to be sure to start the running total at 2.
3because any factor above the square root has a corresponding factor below the square root.
4or at least “a”)
5I can hear you saying "But we're trying to find the prime numbers. You can't tell me to find the prime numbers by finding the next prime number!" remember, we assumed that all the numbers are prime in step 1, so if it isn't marked, then we consider it prime.
6you’re pronouncing it wrong.