Algorithm Sharing 01: Sieve of Eratosthenes - Comprehensive Guide to Prime Number Sieving

Algorithm Application Scenarios

The Sieve of Eratosthenes excels in problems requiring efficient recording of prime numbers, including:

Counting primes within a given range
Generating lists of prime numbers up to N
Prime-related mathematical computations
Cryptographic applications requiring prime generation
Competitive programming problems involving prime numbers

This ancient algorithm, despite its age, remains highly relevant in modern computing due to its elegant simplicity and impressive efficiency for finding all primes up to a specified limit.

Core Conceptual Foundation

Fundamental Principle

The sieve operates on an elegantly simple principle: eliminate composite numbers, leaving only primes.

The logical foundation rests on this mathematical truth: If x is a prime number, then all multiples of x (except x itself) cannot be prime. These multiples can be directly excluded without needing subsequent individual verification, significantly reducing computational overhead.

Implementation Strategy

Since we need to record whether multiples of x are prime, we must first create a container to store the primality status of all numbers within our target range.

We implement this through a boolean array isPrime[], where:

If number i is prime: isPrime[i] = true (or 1)
If number i is composite: isPrime[i] = false (or 0)

Algorithmic Process

The algorithm proceeds as follows:

Initialize all entries to true (assuming all numbers are prime initially)
Iterate through numbers from smallest to largest
When encountering a prime number, mark all its multiples (excluding the prime itself) as composite (set corresponding array values to 0)
After completing the iteration, indices with value 1 represent prime numbers

This approach makes recording prime counts, outputting all primes, and related operations straightforward and efficient.

Correctness Proof

Initial State Analysis

From the basic algorithmic concept, we understand that this method selects composite numbers from within the range and marks them as 0. Since we initialize the array with all values set to 1 (initially assuming all numbers are prime), the algorithm identifies confirmed primes and uses them to eliminate composites, marking them as 0.

Throughout this process, array values only transition from 1 to 0—essentially picking out composite numbers as described initially. This method clearly won't mistakenly mark prime numbers as composite.

Addressing Uncertainty

When first encountering this algorithm, one might feel uncertain: How can we be certain that when we reach a number marked as prime (array value 1), it truly is prime? Could it be that this number is actually composite, but previous iterations failed to identify it, allowing it to slip through as a "fish escaping the net"?

Let's investigate this thoroughly:

Proof of No False Negatives

This method will never mark a prime number as composite.

When iterating from smallest to largest, if number x is composite, then by the nature of composite numbers, x must have at least one prime factor (note: 1 is not prime). Let's denote this prime factor as y.

During iteration from smallest to largest, we will encounter y before x (since y < x). When marking multiples of y as composite, we will definitely mark x as composite. Therefore, no composite number can escape detection and be mistakenly treated as prime.

Mathematical Guarantee

This proof relies on the fundamental theorem of arithmetic: every composite number has at least one prime factor less than or equal to its square root. By systematically eliminating multiples of each discovered prime, we guarantee that all composite numbers will be marked before we reach them in our iteration.

The algorithm's elegance lies in its self-reinforcing nature: each newly discovered prime helps identify more composites, which in turn confirms the primality of remaining unmarked numbers.

Optimization Strategies

Identifying Redundant Operations

Now that we understand why correctness holds, optimization becomes easier to comprehend.

When marking multiples of prime x using the previous method (starting from 2x), we encounter redundant operations: Couldn't 2x have already been marked as composite earlier? If so, this operation would be wasted effort.

Finding the Optimal Starting Point

We need to research: From which point should we start marking to avoid both omissions and overlaps?

From the previous section's insight that composite number x must have at least one prime factor y, we can deduce:

Numbers before x×x don't need marking because for any k×x where k ranges from 2 to x-1:

Case 1: k is prime

When we iterated to k, k×x would have been marked as composite

Case 2: k is not prime

As previously established, k must have a prime factor y
Therefore k×x must also have prime factor y
Clearly, when we iterated to y, k×x would have been marked as composite

Why x×x is the Optimal Starting Point

The number x×x has only factors x and 1, so it can only be marked as composite by x itself. Therefore, starting marking from x×x guarantees:

All previous multiples have already been marked
The starting point x×x can only be marked by x

This makes x×x the optimal "starting point" for marking multiples.

Performance Impact

This optimization significantly improves performance, especially for larger ranges:

For prime 2: Start marking from 4 (no change)
For prime 3: Start marking from 9 (instead of 6)
For prime 5: Start marking from 25 (instead of 10)
For prime 101: Start marking from 10,201 (instead of 202)

The savings become increasingly substantial as we progress through larger primes.

Detailed Algorithm Steps

Step 1: Initialize the Sieve

Create a boolean array isPrime of length n+1:

vector<bool> isPrime(n + 1, true);

Array Structure:

Array indices represent corresponding numbers
Array values indicate whether the corresponding number is prime
Example: isPrime[11] = true means number 11 is prime

Initial Configuration:

Set all array values to true initially
Important: Set indices 0 and 1 to false (by definition, 0 and 1 are not prime numbers)

isPrime[0] = false;
isPrime[1] = false;

Step 2: Eliminate Composite Numbers

Iterate from i = 2 to √n:

for (int i = 2; i * i <= n; i++) {
    if (isPrime[i]) {
        // Mark all multiples of i as composite
        for (int j = i * i; j <= n; j += i) {
            isPrime[j] = false;
        }
    }
}

Optimization Applied: Start marking from i×i instead of 2×i

Why Stop at √n?: Any composite number ≤ n must have at least one prime factor ≤ √n. Therefore, after processing all primes up to √n, all composites will have been marked.

Step 3: Extract Results

After iteration completes, indices with value true represent all prime numbers between 1 and n:

vector<int> primes;
for (int i = 2; i <= n; i++) {
    if (isPrime[i]) {
        primes.push_back(i);
    }
}

Additional Optimization: Skip Even Numbers

Since all even numbers greater than 2 are composite, we can optimize by:

Handle 2 as a special case
Only iterate through odd numbers

vector<bool> isPrime(n + 1, true);
isPrime[0] = isPrime[1] = false;

// Handle 2 separately
for (int i = 4; i <= n; i += 2) {
    isPrime[i] = false;
}

// Process odd numbers only
for (int i = 3; i * i <= n; i += 2) {
    if (isPrime[i]) {
        for (int j = i * i; j <= n; j += 2 * i) {
            isPrime[j] = false;
        }
    }
}

This optimization approximately halves both time and space requirements.

Complete Implementation Example (C++)

class Solution {
public:
    int countPrimes(int n) {
        if (n <= 2) return 0;
        if (n == 3) return 1;
        
        // Initialize sieve array
        vector<bool> isPrime(n, true);
        isPrime[0] = isPrime[1] = false;
        
        // Sieve process - optimized version
        for (int i = 2; i * i < n; i++) {
            if (isPrime[i]) {
                // Mark multiples starting from i*i
                for (int j = i * i; j < n; j += i) {
                    isPrime[j] = false;
                }
            }
        }
        
        // Count primes
        int count = 0;
        for (int i = 2; i < n; i++) {
            if (isPrime[i]) {
                count++;
            }
        }
        
        return count;
    }
};

Alternative: Collecting All Primes

class Solution {
public:
    vector<int> getPrimes(int n) {
        if (n < 2) return {};
        
        vector<bool> isPrime(n + 1, true);
        isPrime[0] = isPrime[1] = false;
        
        for (int i = 2; i * i <= n; i++) {
            if (isPrime[i]) {
                for (int j = i * i; j <= n; j += i) {
                    isPrime[j] = false;
                }
            }
        }
        
        vector<int> primes;
        for (int i = 2; i <= n; i++) {
            if (isPrime[i]) {
                primes.push_back(i);
            }
        }
        
        return primes;
    }
};

Further Optimized Version (Odd Numbers Only)

class Solution {
public:
    int countPrimes(int n) {
        if (n <= 2) return 0;
        
        // Only track odd numbers (index i represents number 2*i+1)
        int m = (n - 1) / 2;
        vector<bool> isPrime(m + 1, true);
        
        for (int i = 1; i * i <= m; i++) {
            if (isPrime[i]) {
                // Mark odd multiples
                int start = 2 * i * (i + 1);
                for (int j = start; j <= m; j += 2 * i + 1) {
                    isPrime[j] = false;
                }
            }
        }
        
        // Count primes (including 2)
        int count = 1; // Start with 1 for the prime number 2
        for (int i = 1; i <= m; i++) {
            if (isPrime[i]) {
                count++;
            }
        }
        
        return count;
    }
};

Complexity Analysis

Time Complexity: O(n log log n)

Derivation:

The total number of marking operations equals:

n/2 (for prime 2)
- n/3 (for prime 3)
- n/5 (for prime 5)
- n/7 (for prime 7)
...

This sums to: n × (1/2 + 1/3 + 1/5 + 1/7 + ...)

The sum of reciprocals of primes up to n approximates to ln ln n (a result from analytic number theory).

Therefore, total operations ≈ n × ln ln n = O(n log log n)

Why So Efficient?:

Much better than the naive O(n√n) approach of testing each number individually
The log log n factor grows extremely slowly
For n = 10^9, log log n ≈ 3.3

Space Complexity: O(n)

Analysis:

Requires one boolean value per number up to n
For very large n, consider bit-level compression to reduce space to O(n/8) bytes
Trade-off between space and access speed

Comparison with Alternative Approaches

Method	Time Complexity	Space Complexity	Best Use Case
Trial Division	O(n√n)	O(1)	Testing individual numbers
Sieve of Eratosthenes	O(n log log n)	O(n)	Finding all primes up to n
Sieve of Atkin	O(n)	O(n)	Very large ranges
Segmented Sieve	O(n log log n)	O(√n)	Memory-constrained environments

Practical Considerations

When to Use This Algorithm

Ideal Scenarios:

Finding all primes up to 10^7 or 10^8
Multiple prime-related queries on the same range
Competitive programming with prime number problems
Educational purposes for understanding algorithmic thinking

Less Suitable Scenarios:

Testing primality of a single large number (use Miller-Rabin instead)
Finding very large primes (use specialized algorithms)
Memory-constrained environments with large n (use segmented sieve)

Implementation Tips

Use appropriate data types: For large n, consider vector<bool> (space-efficient) vs vector<char> (faster access)
Cache optimization: Process in cache-friendly blocks for very large arrays
Parallelization: The sieve can be parallelized, though synchronization overhead may limit gains
Precomputation: For multiple queries, precompute once and reuse

Practice Problems

LeetCode 204: Count Primes

Problem: Count the number of prime numbers less than a non-negative integer n.

Link: LeetCode 204

Solution Approach: Implement the Sieve of Eratosthenes as described above.

Additional Practice Problems

Prime Number of Set Bits in Binary Representation (LeetCode 762)
Prime Palindromes (LeetCode 866)
Ugly Number III (LeetCode 1201) - involves prime factorization
Count Different Palindromic Subsequences - may require prime modular arithmetic

Advanced Extensions

Segmented Sieve

For extremely large ranges where O(n) space is prohibitive, the segmented sieve processes the range in chunks, requiring only O(√n) space while maintaining O(n log log n) time complexity.

Linear Sieve (Sieve of Euler)

An advanced variant that achieves O(n) time complexity by ensuring each composite is marked exactly once, though with increased implementation complexity.

Sieve for Other Properties

The sieving concept extends beyond primality:

Finding numbers with specific divisor counts
Identifying perfect numbers
Computing arithmetic functions (Euler's totient, Möbius function)

Conclusion

The Sieve of Eratosthenes stands as a testament to algorithmic elegance—a 2,000-year-old algorithm that remains practically relevant in modern computing. Its beauty lies in:

Conceptual Simplicity: Easy to understand and implement
Mathematical Soundness: Built on solid number-theoretic foundations
Practical Efficiency: O(n log log n) performance suits many real-world scenarios
Educational Value: Teaches fundamental algorithmic thinking patterns

Understanding this algorithm provides a foundation for more advanced number-theoretic algorithms and demonstrates how ancient mathematical insights continue to power modern computational systems.

Whether you're solving competitive programming problems, implementing cryptographic systems, or simply appreciating mathematical beauty, the Sieve of Eratosthenes deserves a place in your algorithmic toolkit.

Remember: The optimization of starting from i² rather than 2i isn't merely a performance tweak—it's a direct consequence of understanding why the algorithm works. This deep comprehension transforms the algorithm from a memorized procedure into an intuitive tool you can adapt and extend confidently.