Algorithm Sharing 01: Sieve of Eratosthenes - Comprehensive Guide to Prime Number Sieving
Algorithm Application Scenarios
The Sieve of Eratosthenes excels in problems requiring efficient recording of prime numbers, including:
- Counting primes within a given range
- Generating lists of prime numbers up to N
- Prime-related mathematical computations
- Cryptographic applications requiring prime generation
- Competitive programming problems involving prime numbers
This ancient algorithm, despite its age, remains highly relevant in modern computing due to its elegant simplicity and impressive efficiency for finding all primes up to a specified limit.
Core Conceptual Foundation
Fundamental Principle
The sieve operates on an elegantly simple principle: eliminate composite numbers, leaving only primes.
The logical foundation rests on this mathematical truth: If x is a prime number, then all multiples of x (except x itself) cannot be prime. These multiples can be directly excluded without needing subsequent individual verification, significantly reducing computational overhead.
Implementation Strategy
Since we need to record whether multiples of x are prime, we must first create a container to store the primality status of all numbers within our target range.
We implement this through a boolean array isPrime[], where:
- If number i is prime:
isPrime[i] = true(or 1) - If number i is composite:
isPrime[i] = false(or 0)
Algorithmic Process
The algorithm proceeds as follows:
- Initialize all entries to
true(assuming all numbers are prime initially) - Iterate through numbers from smallest to largest
- When encountering a prime number, mark all its multiples (excluding the prime itself) as composite (set corresponding array values to 0)
- After completing the iteration, indices with value 1 represent prime numbers
This approach makes recording prime counts, outputting all primes, and related operations straightforward and efficient.
Correctness Proof
Initial State Analysis
From the basic algorithmic concept, we understand that this method selects composite numbers from within the range and marks them as 0. Since we initialize the array with all values set to 1 (initially assuming all numbers are prime), the algorithm identifies confirmed primes and uses them to eliminate composites, marking them as 0.
Throughout this process, array values only transition from 1 to 0—essentially picking out composite numbers as described initially. This method clearly won't mistakenly mark prime numbers as composite.
Addressing Uncertainty
When first encountering this algorithm, one might feel uncertain: How can we be certain that when we reach a number marked as prime (array value 1), it truly is prime? Could it be that this number is actually composite, but previous iterations failed to identify it, allowing it to slip through as a "fish escaping the net"?
Let's investigate this thoroughly:
Proof of No False Negatives
This method will never mark a prime number as composite.
When iterating from smallest to largest, if number x is composite, then by the nature of composite numbers, x must have at least one prime factor (note: 1 is not prime). Let's denote this prime factor as y.
During iteration from smallest to largest, we will encounter y before x (since y < x). When marking multiples of y as composite, we will definitely mark x as composite. Therefore, no composite number can escape detection and be mistakenly treated as prime.
Mathematical Guarantee
This proof relies on the fundamental theorem of arithmetic: every composite number has at least one prime factor less than or equal to its square root. By systematically eliminating multiples of each discovered prime, we guarantee that all composite numbers will be marked before we reach them in our iteration.
The algorithm's elegance lies in its self-reinforcing nature: each newly discovered prime helps identify more composites, which in turn confirms the primality of remaining unmarked numbers.
Optimization Strategies
Identifying Redundant Operations
Now that we understand why correctness holds, optimization becomes easier to comprehend.
When marking multiples of prime x using the previous method (starting from 2x), we encounter redundant operations: Couldn't 2x have already been marked as composite earlier? If so, this operation would be wasted effort.
Finding the Optimal Starting Point
We need to research: From which point should we start marking to avoid both omissions and overlaps?
From the previous section's insight that composite number x must have at least one prime factor y, we can deduce:
Numbers before x×x don't need marking because for any k×x where k ranges from 2 to x-1:
Case 1: k is prime
- When we iterated to k, k×x would have been marked as composite
Case 2: k is not prime
- As previously established, k must have a prime factor y
- Therefore k×x must also have prime factor y
- Clearly, when we iterated to y, k×x would have been marked as composite
Why x×x is the Optimal Starting Point
The number x×x has only factors x and 1, so it can only be marked as composite by x itself. Therefore, starting marking from x×x guarantees:
- All previous multiples have already been marked
- The starting point x×x can only be marked by x
This makes x×x the optimal "starting point" for marking multiples.
Performance Impact
This optimization significantly improves performance, especially for larger ranges:
- For prime 2: Start marking from 4 (no change)
- For prime 3: Start marking from 9 (instead of 6)
- For prime 5: Start marking from 25 (instead of 10)
- For prime 101: Start marking from 10,201 (instead of 202)
The savings become increasingly substantial as we progress through larger primes.
Detailed Algorithm Steps
Step 1: Initialize the Sieve
Create a boolean array isPrime of length n+1:
vector<bool> isPrime(n + 1, true);Array Structure:
- Array indices represent corresponding numbers
- Array values indicate whether the corresponding number is prime
- Example:
isPrime[11] = truemeans number 11 is prime
Initial Configuration:
- Set all array values to
trueinitially - Important: Set indices 0 and 1 to
false(by definition, 0 and 1 are not prime numbers)
isPrime[0] = false;
isPrime[1] = false;Step 2: Eliminate Composite Numbers
Iterate from i = 2 to √n:
for (int i = 2; i * i <= n; i++) {
if (isPrime[i]) {
// Mark all multiples of i as composite
for (int j = i * i; j <= n; j += i) {
isPrime[j] = false;
}
}
}Optimization Applied: Start marking from i×i instead of 2×i
Why Stop at √n?: Any composite number ≤ n must have at least one prime factor ≤ √n. Therefore, after processing all primes up to √n, all composites will have been marked.
Step 3: Extract Results
After iteration completes, indices with value true represent all prime numbers between 1 and n:
vector<int> primes;
for (int i = 2; i <= n; i++) {
if (isPrime[i]) {
primes.push_back(i);
}
}Additional Optimization: Skip Even Numbers
Since all even numbers greater than 2 are composite, we can optimize by:
- Handle 2 as a special case
- Only iterate through odd numbers
vector<bool> isPrime(n + 1, true);
isPrime[0] = isPrime[1] = false;
// Handle 2 separately
for (int i = 4; i <= n; i += 2) {
isPrime[i] = false;
}
// Process odd numbers only
for (int i = 3; i * i <= n; i += 2) {
if (isPrime[i]) {
for (int j = i * i; j <= n; j += 2 * i) {
isPrime[j] = false;
}
}
}This optimization approximately halves both time and space requirements.
Complete Implementation Example (C++)
class Solution {
public:
int countPrimes(int n) {
if (n <= 2) return 0;
if (n == 3) return 1;
// Initialize sieve array
vector<bool> isPrime(n, true);
isPrime[0] = isPrime[1] = false;
// Sieve process - optimized version
for (int i = 2; i * i < n; i++) {
if (isPrime[i]) {
// Mark multiples starting from i*i
for (int j = i * i; j < n; j += i) {
isPrime[j] = false;
}
}
}
// Count primes
int count = 0;
for (int i = 2; i < n; i++) {
if (isPrime[i]) {
count++;
}
}
return count;
}
};Alternative: Collecting All Primes
class Solution {
public:
vector<int> getPrimes(int n) {
if (n < 2) return {};
vector<bool> isPrime(n + 1, true);
isPrime[0] = isPrime[1] = false;
for (int i = 2; i * i <= n; i++) {
if (isPrime[i]) {
for (int j = i * i; j <= n; j += i) {
isPrime[j] = false;
}
}
}
vector<int> primes;
for (int i = 2; i <= n; i++) {
if (isPrime[i]) {
primes.push_back(i);
}
}
return primes;
}
};Further Optimized Version (Odd Numbers Only)
class Solution {
public:
int countPrimes(int n) {
if (n <= 2) return 0;
// Only track odd numbers (index i represents number 2*i+1)
int m = (n - 1) / 2;
vector<bool> isPrime(m + 1, true);
for (int i = 1; i * i <= m; i++) {
if (isPrime[i]) {
// Mark odd multiples
int start = 2 * i * (i + 1);
for (int j = start; j <= m; j += 2 * i + 1) {
isPrime[j] = false;
}
}
}
// Count primes (including 2)
int count = 1; // Start with 1 for the prime number 2
for (int i = 1; i <= m; i++) {
if (isPrime[i]) {
count++;
}
}
return count;
}
};Complexity Analysis
Time Complexity: O(n log log n)
Derivation:
The total number of marking operations equals:
- n/2 (for prime 2)
- n/3 (for prime 3)
- n/5 (for prime 5)
- n/7 (for prime 7)
- ...
This sums to: n × (1/2 + 1/3 + 1/5 + 1/7 + ...)
The sum of reciprocals of primes up to n approximates to ln ln n (a result from analytic number theory).
Therefore, total operations ≈ n × ln ln n = O(n log log n)
Why So Efficient?:
- Much better than the naive O(n√n) approach of testing each number individually
- The log log n factor grows extremely slowly
- For n = 10^9, log log n ≈ 3.3
Space Complexity: O(n)
Analysis:
- Requires one boolean value per number up to n
- For very large n, consider bit-level compression to reduce space to O(n/8) bytes
- Trade-off between space and access speed
Comparison with Alternative Approaches
| Method | Time Complexity | Space Complexity | Best Use Case |
|---|---|---|---|
| Trial Division | O(n√n) | O(1) | Testing individual numbers |
| Sieve of Eratosthenes | O(n log log n) | O(n) | Finding all primes up to n |
| Sieve of Atkin | O(n) | O(n) | Very large ranges |
| Segmented Sieve | O(n log log n) | O(√n) | Memory-constrained environments |
Practical Considerations
When to Use This Algorithm
Ideal Scenarios:
- Finding all primes up to 10^7 or 10^8
- Multiple prime-related queries on the same range
- Competitive programming with prime number problems
- Educational purposes for understanding algorithmic thinking
Less Suitable Scenarios:
- Testing primality of a single large number (use Miller-Rabin instead)
- Finding very large primes (use specialized algorithms)
- Memory-constrained environments with large n (use segmented sieve)
Implementation Tips
- Use appropriate data types: For large n, consider
vector<bool>(space-efficient) vsvector<char>(faster access) - Cache optimization: Process in cache-friendly blocks for very large arrays
- Parallelization: The sieve can be parallelized, though synchronization overhead may limit gains
- Precomputation: For multiple queries, precompute once and reuse
Practice Problems
LeetCode 204: Count Primes
Problem: Count the number of prime numbers less than a non-negative integer n.
Link: LeetCode 204
Solution Approach: Implement the Sieve of Eratosthenes as described above.
Additional Practice Problems
- Prime Number of Set Bits in Binary Representation (LeetCode 762)
- Prime Palindromes (LeetCode 866)
- Ugly Number III (LeetCode 1201) - involves prime factorization
- Count Different Palindromic Subsequences - may require prime modular arithmetic
Advanced Extensions
Segmented Sieve
For extremely large ranges where O(n) space is prohibitive, the segmented sieve processes the range in chunks, requiring only O(√n) space while maintaining O(n log log n) time complexity.
Linear Sieve (Sieve of Euler)
An advanced variant that achieves O(n) time complexity by ensuring each composite is marked exactly once, though with increased implementation complexity.
Sieve for Other Properties
The sieving concept extends beyond primality:
- Finding numbers with specific divisor counts
- Identifying perfect numbers
- Computing arithmetic functions (Euler's totient, Möbius function)
Conclusion
The Sieve of Eratosthenes stands as a testament to algorithmic elegance—a 2,000-year-old algorithm that remains practically relevant in modern computing. Its beauty lies in:
- Conceptual Simplicity: Easy to understand and implement
- Mathematical Soundness: Built on solid number-theoretic foundations
- Practical Efficiency: O(n log log n) performance suits many real-world scenarios
- Educational Value: Teaches fundamental algorithmic thinking patterns
Understanding this algorithm provides a foundation for more advanced number-theoretic algorithms and demonstrates how ancient mathematical insights continue to power modern computational systems.
Whether you're solving competitive programming problems, implementing cryptographic systems, or simply appreciating mathematical beauty, the Sieve of Eratosthenes deserves a place in your algorithmic toolkit.
Remember: The optimization of starting from i² rather than 2i isn't merely a performance tweak—it's a direct consequence of understanding why the algorithm works. This deep comprehension transforms the algorithm from a memorized procedure into an intuitive tool you can adapt and extend confidently.