5 Common Rate Limiter Algorithms

Inspired by rate limiter section in System Design Interview book, I want to share my understanding of different rate limiter algorithms and implement them in Python.

Rate limiting is a crucial technique in system design that helps control the rate of requests to a service. This post will cover 5 common rate limiter algorithms:

Token Bucket
Leaky Bucket
Fixed Window Counter
Sliding Window Log
Sliding Window Counter

1. Token Bucket Link to heading

The token bucket algorithm maintains a bucket of tokens, where:

The bucket starts with a maximum number of tokens (bucket size)
Tokens are added to the bucket at a constant rate (tokens per second)
When a request arrives:
- If there are enough tokens, one is consumed and the request is allowed
- If there aren’t enough tokens, the request is rejected
The bucket never exceeds its maximum size
This allows for burst traffic up to the bucket size, then smooths out to the refill rate

# token_bucket_demo.py
import time
import threading

class TokenBucket:
    def __init__(self, tokens_per_second: int, bucket_size: int):
        """
        Initialize a token bucket rate limiter.
        
        Args:
            tokens_per_second: Rate at which tokens are added to the bucket
            bucket_size: Maximum number of tokens the bucket can hold
        """
        self.tokens_per_second = tokens_per_second
        self.bucket_size = bucket_size
        self.tokens = bucket_size
        self.last_refill = time.time()
        self.lock = threading.Lock()

    def _refill(self):
        """
        Refill the bucket with tokens based on elapsed time.
        This method is called before each token consumption attempt.
        """
        now = time.time()
        time_passed = now - self.last_refill
        new_tokens = time_passed * self.tokens_per_second
        
        with self.lock:
            self.tokens = min(self.bucket_size, self.tokens + new_tokens)
            self.last_refill = now

    def consume(self, tokens: int = 1) -> bool:
        """
        Try to consume tokens from the bucket.
        
        Args:
            tokens: Number of tokens to consume (default: 1)
            
        Returns:
            bool: True if tokens were consumed, False if not enough tokens
        """
        with self.lock:
            self._refill()
            if self.tokens >= tokens:
                self.tokens -= tokens
                return True
            return False

if __name__ == "__main__":
    # Create a token bucket with 2 tokens per second and a bucket size of 5
    bucket = TokenBucket(tokens_per_second=2, bucket_size=5)

    # Simulate 20 requests
    for i in range(20):
        if bucket.consume():
            print(f"Request {i + 1}: Allowed")
        else:
            print(f"Request {i + 1}: Rejected")
        time.sleep(0.2)  # Sleep for 0.2 seconds between requests

Run the code:

python token_bucket_demo.py
Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Allowed
Request 7: Allowed
Request 8: Rejected
Request 9: Allowed
Request 10: Rejected
Request 11: Allowed
Request 12: Rejected
Request 13: Rejected
Request 14: Allowed
Request 15: Rejected
Request 16: Allowed
Request 17: Rejected
Request 18: Rejected
Request 19: Allowed
Request 20: Rejected

Pros Link to heading

Simple to implement and understand: The token bucket algorithm is conceptually straightforward, making it easy to implement and maintain. Its behavior is predictable and easy to reason about.
Allows for burst traffic: By maintaining a bucket of tokens, it can handle sudden spikes in traffic up to the bucket size, which is useful for applications with variable load patterns.
Memory efficient: Only needs to track a single token count and last refill time, making it very lightweight in terms of memory usage.
Smooths out traffic spikes: After handling burst traffic, it naturally smooths out to the refill rate, preventing sustained high load on the system.
Good for APIs and general-purpose rate limiting: Its balance of burst handling and smoothing makes it suitable for most API rate limiting scenarios.

Cons Link to heading

Can be too permissive with burst traffic: The ability to handle bursts might be undesirable in systems that need strict rate limiting.
May not be suitable for strict rate limiting: The burst capability means it can’t guarantee a strict maximum rate over very short time periods.
Requires careful tuning of bucket size and refill rate: Finding the right balance between burst capacity and refill rate requires careful consideration of the application’s needs.
Not ideal for distributed systems without additional coordination: In distributed environments, maintaining consistent token counts across multiple instances requires additional coordination mechanisms.

2. Leaky Bucket Link to heading

The leaky bucket algorithm:

Requests arrive at the bucket at any rate
The bucket has a maximum capacity (queue size)
Requests are processed at a constant rate (leak rate)
When a request arrives:
- If there’s space in the queue, it’s added
- If the queue is full, the request is rejected
The background thread continuously processes requests at the fixed rate

# leaky_bucket_demo.py
import time
import threading
from queue import Queue

class LeakyBucket:
    def __init__(self, rate_per_second: int, capacity: int):
        """
        Initialize a leaky bucket rate limiter.
        
        Args:
            rate_per_second: Rate at which requests are processed
            capacity: Maximum number of requests that can be queued
        """
        self.rate_per_second = rate_per_second
        self.capacity = capacity
        self.queue = Queue(maxsize=capacity)
        self.lock = threading.Lock()
        self.processing = False
        self.stop_event = threading.Event()

    def start_processing(self):
        """
        Start the background thread that processes requests at a constant rate.
        This method must be called before adding requests.
        """
        self.processing = True
        self.process_thread = threading.Thread(target=self._process_requests)
        self.process_thread.daemon = True
        self.process_thread.start()

    def _process_requests(self):
        """
        Background thread that processes requests at a constant rate.
        This method runs continuously until stop_processing is called.
        """
        while not self.stop_event.is_set():
            with self.lock:
                if not self.queue.empty():
                    request_id = self.queue.get()
                    print(f"Processing request {request_id}...")

            time.sleep(1.0 / self.rate_per_second)

    def add_request(self, request_id: int) -> bool:
        """
        Try to add a request to the bucket.

        Args:
            request_id: The ID of the request to add

        Returns:
            bool: True if request was added to queue, False if queue is full
        """
        with self.lock:
            if self.queue.full():
                return False
            self.queue.put(request_id)
            return True

    def stop_processing(self):
        """
        Stop the background processing thread.
        This method should be called when the rate limiter is no longer needed.
        """
        self.stop_event.set()
        self.processing = False
        if hasattr(self, 'process_thread'):
            self.process_thread.join()

if __name__ == "__main__":
    # Create a leaky bucket that processes 2 requests per second
    # with a maximum queue size of 5
    bucket = LeakyBucket(rate_per_second=2, capacity=5)
    bucket.start_processing()

    try:
        # Simulate 20 requests
        for i in range(20):
            if bucket.add_request(i + 1):
                print(f"Request {i + 1}: Added to queue")
            else:
                print(f"Request {i + 1}: Queue full, request rejected")
            time.sleep(0.2)  # Sleep for 0.2 seconds between requests

        # Wait for all requests to be processed
        time.sleep(5)
    finally:
        bucket.stop_processing()

Run the code:

python leaky_bucket_demo.py
Request 1: Added to queue
Request 2: Added to queue
Request 3: Added to queue
Processing request 1...
Request 4: Added to queue
Request 5: Added to queue
Processing request 2...
Request 6: Added to queue
Request 7: Added to queue
Request 8: Queue full, request rejected
Processing request 3...
Request 9: Added to queue
Request 10: Queue full, request rejected
Processing request 4...
Request 11: Added to queue
Request 12: Queue full, request rejected
Request 13: Queue full, request rejected
Processing request 5...
Request 14: Added to queue
Request 15: Queue full, request rejected
Processing request 6...
Request 16: Added to queue
Request 17: Queue full, request rejected
Request 18: Queue full, request rejected
Processing request 7...
Request 19: Added to queue
Request 20: Queue full, request rejected
Processing request 9...
Processing request 11...
Processing request 14...
Processing request 16...
Processing request 19...

Pros Link to heading

Provides a constant output rate: Guarantees a consistent processing rate, which is crucial for systems that need predictable throughput.
Smooths out traffic bursts: By queuing requests and processing them at a fixed rate, it effectively eliminates traffic spikes.
Good for network traffic shaping: Its constant output rate makes it ideal for network traffic management and bandwidth control.
Predictable behavior: The fixed processing rate makes system behavior more predictable and easier to plan for.
Memory efficient when queue size is limited: With a bounded queue size, it uses a predictable amount of memory.

Cons Link to heading

Can cause request delays: Requests may have to wait in the queue before being processed, potentially increasing latency.
May drop requests when queue is full: Unlike token bucket, it doesn’t allow bursts beyond the queue capacity.
Not suitable for bursty traffic patterns: The constant processing rate might be too restrictive for applications with naturally bursty traffic.
Requires background processing thread: The continuous processing of requests adds complexity to the implementation.
More complex implementation than token bucket: Managing the queue and processing thread requires more code and careful synchronization.

3. Fixed Window Counter Link to heading

The fixed window counter algorithm:

Time is divided into fixed windows (e.g., 1 second)
Each window has a maximum request limit
When a request arrives:
- If the current window’s counter is below the limit, increment and allow
- If the counter is at the limit, reject the request
At the start of each new window, the counter resets to zero

# fixed_window_counter_demo.py
import time
import threading

class FixedWindowCounter:
    def __init__(self, max_requests: int, window_size: int):
        """
        Initialize a fixed window counter rate limiter.
        
        Args:
            max_requests: Maximum number of requests allowed in a window
            window_size: Size of the time window in seconds
        """
        self.max_requests = max_requests
        self.window_size = window_size
        self.window_start = time.time()
        self.counter = 0
        self.lock = threading.Lock()

    def _reset_window(self):
        """
        Reset the counter if the current window has expired.
        This method is called before each request check.
        """
        now = time.time()
        if now - self.window_start >= self.window_size:
            self.window_start = now
            self.counter = 0

    def allow_request(self) -> bool:
        """
        Check if a new request is allowed.
        
        Returns:
            bool: True if request is allowed, False if rate limit exceeded
        """
        with self.lock:
            self._reset_window()
            if self.counter < self.max_requests:
                self.counter += 1
                return True
            return False

if __name__ == "__main__":
    # Create a fixed window counter rate limiter
    # with a maximum of 10 requests per second
    # and a window size of 1 second
    limiter = FixedWindowCounter(max_requests=5, window_size=1)

    # Simulate 20 requests
    for i in range(20):
        if limiter.allow_request():
            print(f"Request {i + 1}: Allowed")
        else:
            print(f"Request {i + 1}: Rejected")
        time.sleep(0.1)  # Sleep for 0.1 seconds between requests

Run the code:

python fixed_window_counter_demo.py
Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Rejected
Request 7: Rejected
Request 8: Rejected
Request 9: Rejected
Request 10: Rejected
Request 11: Allowed
Request 12: Allowed
Request 13: Allowed
Request 14: Allowed
Request 15: Allowed
Request 16: Rejected
Request 17: Rejected
Request 18: Rejected
Request 19: Rejected
Request 20: Rejected

Pros Link to heading

Very simple to implement: Requires minimal code and is easy to understand and maintain.
Memory efficient: Only needs to track a single counter and window start time, using very little memory.
Low computational overhead: Simple counter operations make it very fast and lightweight.
Easy to understand and debug: The fixed window concept is intuitive and straightforward to reason about.
Good for simple use cases: Perfect for basic rate limiting needs where exact precision isn’t required.

Cons Link to heading

Allows bursts at window boundaries: The reset of counters at window boundaries can lead to request bursts, potentially causing system overload.
Less accurate than sliding window approaches: The fixed windows can lead to inaccuracies in rate limiting, especially around window boundaries.
Can be unfair if traffic is unevenly distributed: Traffic patterns might be unfairly limited due to the fixed window nature.
Not suitable for strict rate limiting: The burst potential makes it unsuitable for scenarios requiring precise rate control.
May cause thundering herd problems: The window reset can cause multiple requests to be processed simultaneously, potentially overwhelming the system.

4. Sliding Window Log Link to heading

The sliding window log algorithm:

Maintains a log of all request timestamps
When a request arrives:
- Remove timestamps older than the window size
- Count remaining timestamps
- If count is below limit, add new timestamp and allow
- If count is at limit, reject the request
The window slides continuously with time
Provides exact counting within any time window

# sliding_window_log_demo.py
import time
import threading
from collections import deque

class SlidingWindowLog:
    def __init__(self, max_requests: int, window_size: int):
        """
        Initialize a sliding window log rate limiter.
        
        Args:
            max_requests: Maximum number of requests allowed in the window
            window_size: Size of the sliding window in seconds
        """
        self.max_requests = max_requests
        self.window_size = window_size
        self.requests = deque()
        self.lock = threading.Lock()

    def _clean_old_requests(self):
        """
        Remove timestamps of requests that are outside the current window.
        This method is called before each request check.
        """
        now = time.time()
        while self.requests and now - self.requests[0] >= self.window_size:
            self.requests.popleft()

    def allow_request(self) -> bool:
        """
        Check if a new request is allowed.
        
        Returns:
            bool: True if request is allowed, False if rate limit exceeded
        """
        with self.lock:
            self._clean_old_requests()
            if len(self.requests) < self.max_requests:
                self.requests.append(time.time())
                return True
            return False

if __name__ == "__main__":
    # Create a sliding window log rate limiter
    # with a maximum of 5 requests per second
    # and a window size of 1 second
    limiter = SlidingWindowLog(max_requests=5, window_size=1)

    # Simulate 20 requests
    for i in range(20):
        if limiter.allow_request():
            print(f"Request {i + 1}: Allowed")
        else:
            print(f"Request {i + 1}: Rejected")
        time.sleep(0.1)  # Sleep for 0.1 seconds between requests

Run the code:

python sliding_window_log_demo.py
Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Rejected
Request 7: Rejected
Request 8: Rejected
Request 9: Rejected
Request 10: Rejected
Request 11: Allowed
Request 12: Allowed
Request 13: Allowed
Request 14: Allowed
Request 15: Allowed
Request 16: Rejected
Request 17: Rejected
Request 18: Rejected
Request 19: Rejected
Request 20: Rejected

Pros Link to heading

Most accurate rate limiting: Maintains exact counts of requests within any time window, providing precise rate limiting.
No burst traffic allowed: The sliding window nature prevents any form of burst traffic, ensuring consistent load.
Precise control over request timing: Exact timestamps allow for fine-grained control over request processing.
Fair distribution of requests: The continuous sliding window ensures fair treatment of all requests regardless of timing.
Good for high-accuracy requirements: Ideal for systems where precise rate limiting is critical.

Cons Link to heading

Memory intensive: Storing all request timestamps can consume significant memory, especially under high load.
Higher computational overhead: Processing and cleaning up timestamps requires more CPU resources.
More complex implementation: Managing the timestamp log and window sliding adds complexity to the code.
Not suitable for high-throughput systems: The memory and computational overhead make it less ideal for very high-volume systems.
May need cleanup of old timestamps: Requires periodic cleanup of old timestamps to prevent memory growth.

5. Sliding Window Counter Link to heading

The sliding window counter algorithm:

Maintains two counters: current window and previous window
When a request arrives:
- Update window boundaries if needed
- Calculate weighted count from both windows
- If total is below limit, increment current window and allow
- If total is at limit, reject the request
Uses weighted average to approximate sliding window
Provides good accuracy with minimal memory usage

# sliding_window_counter_demo.py
import time
import threading

class SlidingWindowCounter:
    def __init__(self, max_requests: int, window_size: int):
        """
        Initialize a sliding window counter rate limiter.
        
        Args:
            max_requests: Maximum number of requests allowed in the window
            window_size: Size of the sliding window in seconds
        """
        self.max_requests = max_requests
        self.window_size = window_size
        self.current_window = 0
        self.previous_window = 0
        self.window_start = time.time()
        self.lock = threading.Lock()

    def _reset_window(self):
        """
        Update window counters based on elapsed time.
        This method is called before each request check.
        """
        now = time.time()
        if now - self.window_start >= self.window_size:
            self.previous_window = self.current_window
            self.current_window = 0
            self.window_start = now

    def allow_request(self) -> bool:
        """
        Check if a new request is allowed using weighted counters.
        
        Returns:
            bool: True if request is allowed, False if rate limit exceeded
        """
        with self.lock:
            self._reset_window()
            current_time = time.time()
            window_progress = (current_time - self.window_start) / self.window_size
            
            # Weighted count of previous window
            weighted_count = self.previous_window * (1 - window_progress)
            total_count = weighted_count + self.current_window
            
            if total_count < self.max_requests:
                self.current_window += 1
                return True
            return False

if __name__ == "__main__":
    # Create a sliding window counter rate limiter
    # with a maximum of 5 requests per second
    # and a window size of 1 second
    limiter = SlidingWindowCounter(max_requests=5, window_size=1)

    # Simulate 20 requests
    for i in range(20):
        if limiter.allow_request():
            print(f"Request {i + 1}: Allowed")
        else:
            print(f"Request {i + 1}: Rejected")
        time.sleep(0.1)  # Sleep for 0.1 seconds between requests

Run the code:

python sliding_window_counter_demo.py
Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Rejected
Request 7: Rejected
Request 8: Rejected
Request 9: Rejected
Request 10: Rejected
Request 11: Allowed
Request 12: Rejected
Request 13: Allowed
Request 14: Rejected
Request 15: Allowed
Request 16: Rejected
Request 17: Allowed
Request 18: Rejected
Request 19: Allowed
Request 20: Rejected

Pros Link to heading

Memory efficient: Only needs two counters regardless of request volume, making it very memory efficient.
More accurate than fixed window: The weighted average approach provides better accuracy than simple fixed windows.
Good balance of accuracy and performance: Offers a practical compromise between precision and resource usage.
Suitable for distributed systems: The simple counter-based approach works well in distributed environments.
Handles burst traffic better than fixed window: The weighted average helps smooth out traffic better than fixed windows.

Cons Link to heading

More complex than fixed window: The weighted average calculation adds complexity compared to simple fixed windows.
Still less accurate than sliding window log: While more accurate than fixed window, it’s not as precise as the sliding window log.
Requires careful tuning of window size: Finding the optimal window size requires consideration of the specific use case.
May need synchronization in distributed systems: While suitable for distributed systems, it still requires proper synchronization mechanisms.
More computational overhead than fixed window: The weighted average calculations add some computational overhead.

1. Token Bucket Link to heading

Pros Link to heading

Cons Link to heading

2. Leaky Bucket Link to heading

Pros Link to heading

Cons Link to heading

3. Fixed Window Counter Link to heading

Pros Link to heading

Cons Link to heading

4. Sliding Window Log Link to heading

Pros Link to heading

Cons Link to heading

5. Sliding Window Counter Link to heading

Pros Link to heading

Cons Link to heading

Further Reading: Link to heading