Algorithm Sharing 01: Sieve of Eratosthenes Algorithm (Sieve Method) [Easy]
Algorithm Application Scenarios
The Sieve of Eratosthenes algorithm excels at efficiently recording prime numbers (including their quantity) in related problems. This ancient mathematical technique remains highly relevant in modern computational contexts, particularly when dealing with problems requiring identification or counting of prime numbers within specified ranges.
Core Philosophy
Eliminate composite numbers, leave prime numbers.
The fundamental insight driving this algorithm proves elegantly simple yet profoundly powerful: if x is a prime number, then all multiples of x must not be prime numbers. Therefore, these multiples can be directly excluded without needing to enumerate them one by one in subsequent iterations, significantly reducing computational overhead.
Since we need to mark all multiples of x as non-prime, we obviously need to create a container before sieving to preserve the primality status of all numbers (referring to all numbers within the required range).
We implement this through an array isPrime[]. If number i is prime, then isPrime[i] equals 1; otherwise, it equals 0.
We then traverse from smallest to largest. If a number is prime, we mark all its multiples (excluding the prime number itself) as composite (that is, set the corresponding array index value to 0).
Once we complete the traversal, any array position with value 1 represents a prime number. Operations like recording prime quantity or outputting all primes become straightforward.
Correctness Explanation
From the basic philosophy described above, we can understand that this algorithm selects composite numbers from within and marks them as 0. Therefore, when the array is initially created, we first mark everything as 1—all numbers are initially assumed to be prime. Then, through the algorithm, we identify confirmed primes and use them to filter out composite numbers, marking them as 0. Thus, throughout the entire process, array values continuously change from 1 to 0—essentially picking out composite numbers as mentioned at the beginning.
When first encountering this method, one might naturally feel a sense of uncertainty: why can we definitively determine that when traversing to a certain prime number (that is, when the array value equals 1), it must actually be prime? Could it be that it was originally composite, but previous traversal operations failed to discover it, thus becoming a fish that escaped the net?
Let's investigate this thoroughly: First, this method will definitely not mark prime numbers as composite. When we traverse from smallest to largest, if number x is composite, then according to the nature of composite numbers, x must have at least one prime factor (note that 1 is not prime). Let's denote this prime factor as y. Then, during traversal from smallest to largest, we will encounter y before x. When marking multiples of y as composite, we will definitely mark x as composite. Therefore, no composite numbers will escape detection and be mistakenly treated as prime numbers.
This logical guarantee ensures the algorithm's absolute correctness—every composite number will be discovered and marked through its smallest prime factor.
Optimization Strategies
Once we understand why its correctness holds, subsequent optimization operations become easy to comprehend.
For a prime number x, marking from 2x as composite according to the previous method actually involves duplicate operations. Could it be that this 2x was already marked as composite previously? If so, wouldn't this operation be wasted effort?
Now we need to research from what point to start marking ensures neither omission nor overlap.
Actually, from the previous section's statement that composite number x must have at least one prime factor y, we can guess: numbers before x×x don't need marking. This is because for numbers k×x where k takes values from 2 to x-1, k is less than x. Two situations exist:
First situation: k is prime. Then when traversing to k, k×x will definitely be marked as composite.
Second situation: k is not prime. Then as stated earlier, k must have a prime factor y. Therefore, k×x must also have a prime factor y. Obviously, when traversing to y, k×x will definitely be marked as composite.
Meanwhile, x×x has only factors x and 1, so it can only be marked as composite by x. Therefore, starting marking from x×x both guarantees that previous numbers have been marked and ensures that the starting point x×x can only be marked by x. Thus, starting from x×x represents the optimal "starting point."
This optimization reduces redundant operations significantly, particularly for larger prime numbers where the improvement becomes substantial.
Algorithm Steps
Step 1: Initialize the Sieve
Create a boolean type array (int is also acceptable) isPrime[n + 1] with length n + 1. The array's index represents the corresponding number, while the array's value indicates whether the corresponding number is prime (for example, isPrime[11] = 1 means number 11 is prime).
Initially, set the entire array to true. However, note that positions with indices 0 and 1 must be set to false (these two numbers are defined as non-prime).
Step 2: Sieve Out Composite Numbers
Traverse from i = 2 to √n. If isPrime[i] = true, then mark all multiples of i as false. Of course, after optimization, start marking from i×i as false.
The traversal only needs to continue to √n because any composite number larger than √n must have a factor smaller than or equal to √n, which would have already been processed.
Step 3: Result Extraction
After traversal completes, all indices with value true represent all prime numbers between 1 and n.
Example Code (C++)
class Solution {
public:
int Primes(int n) {
if (n <= 2) return 0;
if (n == 3) return 1;
vector<bool> IsPrime(n, true);
for(int i = 3; i * i < n; i += 2){
if(IsPrime[i]){
for(int j = i * i; j < n; j += i){
IsPrime[j] = false;
}
}
}
// At this point, all numbers in the container have been filtered
// Next, perform desired operations according to the problem requirements
// For further efficiency, some operations can even be performed
// synchronously during the filtering process, but ensure that
// operated numbers have already been filtered
}
};Code Explanation:
The implementation starts by handling edge cases where n is 2 or less (return 0) or exactly 3 (return 1). A boolean vector tracks primality status for all numbers up to n.
The outer loop iterates through odd numbers starting from 3 (since 2 is the only even prime and is handled implicitly). The condition i * i < n implements the optimization discussed earlier—we only need to check up to the square root of n.
The inner loop marks all multiples of each prime as composite, starting from i² as optimized. The increment j += i efficiently steps through all multiples.
Complexity Analysis
Time Complexity: O(n log log n)
The time complexity analysis reveals why this algorithm proves so efficient. The harmonic series nature of the sieving process results in the log log n factor. For each prime p, we perform approximately n/p operations. Summing across all primes up to n gives us n × (1/2 + 1/3 + 1/5 + 1/7 + ...) which approximates to n log log n.
This represents near-linear performance, making the algorithm highly scalable for large input sizes.
Space Complexity: O(n)
The space requirement grows linearly with input size, as we need one boolean value per number in the range. For most practical applications, this space requirement remains manageable even for large values of n.
Reference Problems
For practical application and testing of this algorithm, consider working through the following problem:
LeetCode 204: Count Primes
This classic problem directly applies the Sieve of Eratosthenes algorithm, requiring you to count all prime numbers less than a given non-negative integer n. It serves as an excellent exercise for understanding both the basic implementation and optimization techniques discussed in this article.
Practical Considerations
When implementing this algorithm in production environments, several practical considerations deserve attention:
Memory Optimization: For extremely large values of n, consider using bit arrays instead of boolean arrays to reduce memory consumption by a factor of 8.
Parallel Processing: The sieving process can be parallelized for certain ranges, though care must be taken to avoid race conditions when marking composite numbers.
Segmented Sieve: For very large ranges where storing the entire array proves impractical, the segmented sieve variant processes the range in manageable chunks while maintaining the algorithm's core logic.
The Sieve of Eratosthenes remains one of the most elegant algorithms in computational mathematics—simple to understand, efficient in practice, and beautiful in its mathematical foundation. Its两千-year history demonstrates the timeless nature of well-designed algorithms.