Algorithm Sharing 01: Sieve of Eratosthenes (Sieve Method) [Beginner Level]

Algorithm Application Scenarios

The Sieve of Eratosthenes efficiently solves problems related to recording prime numbers, including counting primes within a given range.

Core Concept

Fundamental Principle: Eliminate composite numbers, leaving only primes.

Logical Foundation:

If x is a prime number, then all multiples of x must not be prime. Therefore, these numbers can be directly excluded without needing to iterate through them individually in subsequent steps, thereby improving efficiency.

Implementation Requirement:

Since we need to mark all multiples of x as non-prime, we must first create a container to store the primality status of all numbers within the required range before beginning the sieving process.

Data Structure:

We implement this using an array isPrime[]. If number i is prime, then isPrime[i] equals 1; otherwise, it equals 0.

Algorithm Process:

Initialize all values in the array to 1 (assuming all numbers are prime initially)
Iterate from smallest to largest
When encountering a prime number, mark all its multiples (excluding the prime number itself) as composite (set corresponding array index values to 0)
After completion, all indices with value 1 represent prime numbers

This approach makes operations like recording prime counts or outputting all primes straightforward and efficient.

Correctness Explanation

Why This Method Works

From the basic concept described above, we can understand that this algorithm selects composite numbers from the array and marks them as 0. Since we initially mark all array values as 1 (considering all numbers as prime), the algorithm identifies confirmed primes through the process and then eliminates composites, marking them as 0.

Key Observation:

Throughout the process, array values only change from 1 to 0—meaning we're continuously filtering out composite numbers. This method obviously won't mistakenly mark prime numbers as composite (from the method's intuition, we can confirm: prime numbers won't be easily marked as composite, but composite numbers might not be discovered and thus be mistakenly treated as prime).

Addressing Uncertainty

When first encountering this method, one might feel uncertain: Why can we be certain that when we traverse to a certain prime number (i.e., when the array value is 1), it must actually be prime? Could it be that it's actually composite, but previous traversal operations failed to discover it, leaving a "fish that escaped the net"?

Let's Investigate Thoroughly:

First, this method definitely won't mark prime numbers as composite. When we traverse from smallest to largest, if number x is composite, then according to the nature of composite numbers, x has at least one prime factor (note: 1 is not prime). Let's denote this factor as y.

Critical Insight:

During traversal from smallest to largest, we will encounter y before x. When marking multiples of y as composite, we will definitely mark x as composite. Therefore, there won't be any "escaped fish"—no composite numbers remain unmarked and mistakenly become primes.

Mathematical Guarantee:

This ensures the algorithm's correctness: every composite number will be marked by its smallest prime factor before we reach it in the traversal.

Optimization Strategies

Understanding the Optimization Opportunity

Now that we understand why the correctness holds, the optimization operations become easier to comprehend.

The Problem:

For a prime number x, marking from 2x as composite according to the previous method involves redundant operations. Could it be that 2x was already marked as composite previously? If so, wouldn't this operation be wasted effort?

Finding the Optimal Starting Point

Research Question:

From what point should we start marking to avoid both omissions and overlaps?

Key Insight:

From the previous section's discussion that composite number x has at least one prime factor y, we can infer: numbers before x×x don't need marking.

Reasoning:

For k×x where k ranges from 2 to x-1:

Case 1: k is prime

When traversing to k, k×x will definitely be marked as composite

Case 2: k is not prime

Following the same logic as before, k must have a prime factor y
Therefore, k×x must also have a prime factor y
Clearly, when traversing to y, k×x will definitely be marked as composite

The Critical Point: x×x

x×x has only factors x and 1, so it can only be marked as composite by x itself. Therefore, starting marking from x×x both guarantees that previous numbers have been marked and ensures that the starting point x×x can only be marked by x.

Conclusion:

Starting from x×x represents the optimal "starting point."

Algorithm Steps

Step 1: Initialize the Sieve

Create a boolean type (int is also acceptable) array isPrime[n + 1] with length n + 1.

Array Structure:

Array indices represent corresponding numbers
Array values indicate whether the corresponding number is prime
Example: isPrime[11] = 1 means number 11 is prime

Initialization:

Set all array values initially to true
Set positions 0 and 1 to false (these two numbers are defined as non-prime)

Step 2: Eliminate Composite Numbers

Traverse from i = 2 to √n:

Condition: If isPrime[i] = true

Action: Mark all multiples of i as false

Optimized Approach: Start marking from i×i instead of 2×i

Step 3: Extract Results

After traversal completion, all indices with value true represent all prime numbers between 1 and n.

Implementation Code (C++)

class Solution {
public:
    int countPrimes(int n) {
        if (n <= 2) return 0;
        if (n == 3) return 1;
        
        vector<bool> isPrime(n, true);
        
        // Sieve process: only need to check up to sqrt(n)
        for(int i = 3; i * i < n; i += 2) {
            if(isPrime[i]) {
                // Mark multiples starting from i*i
                for(int j = i * i; j < n; j += i) {
                    isPrime[j] = false;
                }
            }
        }
        
        // Count remaining primes
        int count = 1; // Start with 1 to account for prime number 2
        for(int i = 3; i < n; i += 2) {
            if(isPrime[i]) {
                count++;
            }
        }
        
        return count;
    }
};

Implementation Notes:

The outer loop only needs to iterate up to √n because any composite number less than n must have a prime factor less than or equal to √n
The inner loop starts from i×i based on the optimization discussed earlier
We can skip even numbers (except 2) to improve efficiency
For further optimization, certain operations can be performed synchronously during the sieving process, but ensure that operated numbers have already been processed

Complexity Analysis

Time Complexity: O(n log log n)

Derivation:

The time complexity comes from the sum of the harmonic series of primes:

n/2 + n/3 + n/5 + n/7 + n/11 + ... 
= n × (1/2 + 1/3 + 1/5 + 1/7 + 1/11 + ...)
≈ n × log log n

This represents one of the most efficient algorithms for finding all primes up to n.

Space Complexity: O(n)

Explanation:

We require an array of size n + 1 to store the primality status of each number. This is optimal for algorithms that need to return or count all primes up to n.

Practice Problems

Recommended Exercise

LeetCode 204: Count Primes

Problem Link

Problem Description:

Given an integer n, return the number of prime numbers that are strictly less than n.

Example 1:

Input: n = 10
Output: 4
Explanation: There are 4 prime numbers less than 10: 2, 3, 5, 7

Example 2:

Input: n = 0
Output: 0

Example 3:

Input: n = 1
Output: 0

Constraints:

0 ≤ n ≤ 5 × 10^6

Solution Approach

The Sieve of Eratosthenes provides an optimal solution for this problem given the constraint range. The algorithm efficiently handles the upper bound of 5 million within acceptable time limits.

Advanced Considerations

Segmented Sieve

For very large values of n where O(n) space becomes prohibitive, consider the Segmented Sieve approach, which reduces space complexity to O(√n) while maintaining similar time complexity.

Linear Sieve

For applications requiring additional number-theoretic functions (like Euler's totient function), the Linear Sieve (also known as Euler's Sieve) achieves O(n) time complexity with the ability to compute multiplicative functions during the sieving process.

Conclusion

The Sieve of Eratosthenes represents one of the most elegant and efficient algorithms in computational number theory. Its simplicity, combined with excellent performance characteristics, makes it an essential tool for any programmer working with prime numbers.

Key takeaways:

Understanding the correctness proof builds confidence in the algorithm
The optimization from 2x to x² significantly improves practical performance
The algorithm's time complexity of O(n log log n) is nearly linear
Space complexity of O(n) is acceptable for most practical applications

Master this algorithm, and you'll have a powerful tool for solving a wide range of number-theoretic problems efficiently.