Inspired by rate limiter section in System Design Interview book, I want to share my understanding of different rate limiter algorithms and implement them in Python.
Rate limiting is a crucial technique in system design that helps control the rate of requests to a service. This post will cover 5 common rate limiter algorithms:
- Token Bucket
- Leaky Bucket
- Fixed Window Counter
- Sliding Window Log
- Sliding Window Counter
1. Token Bucket Link to heading
The token bucket algorithm maintains a bucket of tokens, where:
- The bucket starts with a maximum number of tokens (bucket size)
- Tokens are added to the bucket at a constant rate (tokens per second)
- When a request arrives:
- If there are enough tokens, one is consumed and the request is allowed
- If there aren’t enough tokens, the request is rejected
- The bucket never exceeds its maximum size
- This allows for burst traffic up to the bucket size, then smooths out to the refill rate
# token_bucket_demo.py
import time
import threading
class TokenBucket:
def __init__(self, tokens_per_second: int, bucket_size: int):
"""
Initialize a token bucket rate limiter.
Args:
tokens_per_second: Rate at which tokens are added to the bucket
bucket_size: Maximum number of tokens the bucket can hold
"""
self.tokens_per_second = tokens_per_second
self.bucket_size = bucket_size
self.tokens = bucket_size
self.last_refill = time.time()
self.lock = threading.Lock()
def _refill(self):
"""
Refill the bucket with tokens based on elapsed time.
This method is called before each token consumption attempt.
"""
now = time.time()
time_passed = now - self.last_refill
new_tokens = time_passed * self.tokens_per_second
with self.lock:
self.tokens = min(self.bucket_size, self.tokens + new_tokens)
self.last_refill = now
def consume(self, tokens: int = 1) -> bool:
"""
Try to consume tokens from the bucket.
Args:
tokens: Number of tokens to consume (default: 1)
Returns:
bool: True if tokens were consumed, False if not enough tokens
"""
with self.lock:
self._refill()
if self.tokens >= tokens:
self.tokens -= tokens
return True
return False
if __name__ == "__main__":
# Create a token bucket with 2 tokens per second and a bucket size of 5
bucket = TokenBucket(tokens_per_second=2, bucket_size=5)
# Simulate 20 requests
for i in range(20):
if bucket.consume():
print(f"Request {i + 1}: Allowed")
else:
print(f"Request {i + 1}: Rejected")
time.sleep(0.2) # Sleep for 0.2 seconds between requests
Run the code:
python token_bucket_demo.py
Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Allowed
Request 7: Allowed
Request 8: Rejected
Request 9: Allowed
Request 10: Rejected
Request 11: Allowed
Request 12: Rejected
Request 13: Rejected
Request 14: Allowed
Request 15: Rejected
Request 16: Allowed
Request 17: Rejected
Request 18: Rejected
Request 19: Allowed
Request 20: Rejected
Pros Link to heading
- Simple to implement and understand: The token bucket algorithm is conceptually straightforward, making it easy to implement and maintain. Its behavior is predictable and easy to reason about.
- Allows for burst traffic: By maintaining a bucket of tokens, it can handle sudden spikes in traffic up to the bucket size, which is useful for applications with variable load patterns.
- Memory efficient: Only needs to track a single token count and last refill time, making it very lightweight in terms of memory usage.
- Smooths out traffic spikes: After handling burst traffic, it naturally smooths out to the refill rate, preventing sustained high load on the system.
- Good for APIs and general-purpose rate limiting: Its balance of burst handling and smoothing makes it suitable for most API rate limiting scenarios.
Cons Link to heading
- Can be too permissive with burst traffic: The ability to handle bursts might be undesirable in systems that need strict rate limiting.
- May not be suitable for strict rate limiting: The burst capability means it can’t guarantee a strict maximum rate over very short time periods.
- Requires careful tuning of bucket size and refill rate: Finding the right balance between burst capacity and refill rate requires careful consideration of the application’s needs.
- Not ideal for distributed systems without additional coordination: In distributed environments, maintaining consistent token counts across multiple instances requires additional coordination mechanisms.
2. Leaky Bucket Link to heading
The leaky bucket algorithm:
- Requests arrive at the bucket at any rate
- The bucket has a maximum capacity (queue size)
- Requests are processed at a constant rate (leak rate)
- When a request arrives:
- If there’s space in the queue, it’s added
- If the queue is full, the request is rejected
- The background thread continuously processes requests at the fixed rate
# leaky_bucket_demo.py
import time
import threading
from queue import Queue
class LeakyBucket:
def __init__(self, rate_per_second: int, capacity: int):
"""
Initialize a leaky bucket rate limiter.
Args:
rate_per_second: Rate at which requests are processed
capacity: Maximum number of requests that can be queued
"""
self.rate_per_second = rate_per_second
self.capacity = capacity
self.queue = Queue(maxsize=capacity)
self.lock = threading.Lock()
self.processing = False
self.stop_event = threading.Event()
def start_processing(self):
"""
Start the background thread that processes requests at a constant rate.
This method must be called before adding requests.
"""
self.processing = True
self.process_thread = threading.Thread(target=self._process_requests)
self.process_thread.daemon = True
self.process_thread.start()
def _process_requests(self):
"""
Background thread that processes requests at a constant rate.
This method runs continuously until stop_processing is called.
"""
while not self.stop_event.is_set():
with self.lock:
if not self.queue.empty():
request_id = self.queue.get()
print(f"Processing request {request_id}...")
time.sleep(1.0 / self.rate_per_second)
def add_request(self, request_id: int) -> bool:
"""
Try to add a request to the bucket.
Args:
request_id: The ID of the request to add
Returns:
bool: True if request was added to queue, False if queue is full
"""
with self.lock:
if self.queue.full():
return False
self.queue.put(request_id)
return True
def stop_processing(self):
"""
Stop the background processing thread.
This method should be called when the rate limiter is no longer needed.
"""
self.stop_event.set()
self.processing = False
if hasattr(self, 'process_thread'):
self.process_thread.join()
if __name__ == "__main__":
# Create a leaky bucket that processes 2 requests per second
# with a maximum queue size of 5
bucket = LeakyBucket(rate_per_second=2, capacity=5)
bucket.start_processing()
try:
# Simulate 20 requests
for i in range(20):
if bucket.add_request(i + 1):
print(f"Request {i + 1}: Added to queue")
else:
print(f"Request {i + 1}: Queue full, request rejected")
time.sleep(0.2) # Sleep for 0.2 seconds between requests
# Wait for all requests to be processed
time.sleep(5)
finally:
bucket.stop_processing()
Run the code:
python leaky_bucket_demo.py
Request 1: Added to queue
Request 2: Added to queue
Request 3: Added to queue
Processing request 1...
Request 4: Added to queue
Request 5: Added to queue
Processing request 2...
Request 6: Added to queue
Request 7: Added to queue
Request 8: Queue full, request rejected
Processing request 3...
Request 9: Added to queue
Request 10: Queue full, request rejected
Processing request 4...
Request 11: Added to queue
Request 12: Queue full, request rejected
Request 13: Queue full, request rejected
Processing request 5...
Request 14: Added to queue
Request 15: Queue full, request rejected
Processing request 6...
Request 16: Added to queue
Request 17: Queue full, request rejected
Request 18: Queue full, request rejected
Processing request 7...
Request 19: Added to queue
Request 20: Queue full, request rejected
Processing request 9...
Processing request 11...
Processing request 14...
Processing request 16...
Processing request 19...
Pros Link to heading
- Provides a constant output rate: Guarantees a consistent processing rate, which is crucial for systems that need predictable throughput.
- Smooths out traffic bursts: By queuing requests and processing them at a fixed rate, it effectively eliminates traffic spikes.
- Good for network traffic shaping: Its constant output rate makes it ideal for network traffic management and bandwidth control.
- Predictable behavior: The fixed processing rate makes system behavior more predictable and easier to plan for.
- Memory efficient when queue size is limited: With a bounded queue size, it uses a predictable amount of memory.
Cons Link to heading
- Can cause request delays: Requests may have to wait in the queue before being processed, potentially increasing latency.
- May drop requests when queue is full: Unlike token bucket, it doesn’t allow bursts beyond the queue capacity.
- Not suitable for bursty traffic patterns: The constant processing rate might be too restrictive for applications with naturally bursty traffic.
- Requires background processing thread: The continuous processing of requests adds complexity to the implementation.
- More complex implementation than token bucket: Managing the queue and processing thread requires more code and careful synchronization.
3. Fixed Window Counter Link to heading
The fixed window counter algorithm:
- Time is divided into fixed windows (e.g., 1 second)
- Each window has a maximum request limit
- When a request arrives:
- If the current window’s counter is below the limit, increment and allow
- If the counter is at the limit, reject the request
- At the start of each new window, the counter resets to zero
# fixed_window_counter_demo.py
import time
import threading
class FixedWindowCounter:
def __init__(self, max_requests: int, window_size: int):
"""
Initialize a fixed window counter rate limiter.
Args:
max_requests: Maximum number of requests allowed in a window
window_size: Size of the time window in seconds
"""
self.max_requests = max_requests
self.window_size = window_size
self.window_start = time.time()
self.counter = 0
self.lock = threading.Lock()
def _reset_window(self):
"""
Reset the counter if the current window has expired.
This method is called before each request check.
"""
now = time.time()
if now - self.window_start >= self.window_size:
self.window_start = now
self.counter = 0
def allow_request(self) -> bool:
"""
Check if a new request is allowed.
Returns:
bool: True if request is allowed, False if rate limit exceeded
"""
with self.lock:
self._reset_window()
if self.counter < self.max_requests:
self.counter += 1
return True
return False
if __name__ == "__main__":
# Create a fixed window counter rate limiter
# with a maximum of 10 requests per second
# and a window size of 1 second
limiter = FixedWindowCounter(max_requests=5, window_size=1)
# Simulate 20 requests
for i in range(20):
if limiter.allow_request():
print(f"Request {i + 1}: Allowed")
else:
print(f"Request {i + 1}: Rejected")
time.sleep(0.1) # Sleep for 0.1 seconds between requests
Run the code:
python fixed_window_counter_demo.py
Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Rejected
Request 7: Rejected
Request 8: Rejected
Request 9: Rejected
Request 10: Rejected
Request 11: Allowed
Request 12: Allowed
Request 13: Allowed
Request 14: Allowed
Request 15: Allowed
Request 16: Rejected
Request 17: Rejected
Request 18: Rejected
Request 19: Rejected
Request 20: Rejected
Pros Link to heading
- Very simple to implement: Requires minimal code and is easy to understand and maintain.
- Memory efficient: Only needs to track a single counter and window start time, using very little memory.
- Low computational overhead: Simple counter operations make it very fast and lightweight.
- Easy to understand and debug: The fixed window concept is intuitive and straightforward to reason about.
- Good for simple use cases: Perfect for basic rate limiting needs where exact precision isn’t required.
Cons Link to heading
- Allows bursts at window boundaries: The reset of counters at window boundaries can lead to request bursts, potentially causing system overload.
- Less accurate than sliding window approaches: The fixed windows can lead to inaccuracies in rate limiting, especially around window boundaries.
- Can be unfair if traffic is unevenly distributed: Traffic patterns might be unfairly limited due to the fixed window nature.
- Not suitable for strict rate limiting: The burst potential makes it unsuitable for scenarios requiring precise rate control.
- May cause thundering herd problems: The window reset can cause multiple requests to be processed simultaneously, potentially overwhelming the system.
4. Sliding Window Log Link to heading
The sliding window log algorithm:
- Maintains a log of all request timestamps
- When a request arrives:
- Remove timestamps older than the window size
- Count remaining timestamps
- If count is below limit, add new timestamp and allow
- If count is at limit, reject the request
- The window slides continuously with time
- Provides exact counting within any time window
# sliding_window_log_demo.py
import time
import threading
from collections import deque
class SlidingWindowLog:
def __init__(self, max_requests: int, window_size: int):
"""
Initialize a sliding window log rate limiter.
Args:
max_requests: Maximum number of requests allowed in the window
window_size: Size of the sliding window in seconds
"""
self.max_requests = max_requests
self.window_size = window_size
self.requests = deque()
self.lock = threading.Lock()
def _clean_old_requests(self):
"""
Remove timestamps of requests that are outside the current window.
This method is called before each request check.
"""
now = time.time()
while self.requests and now - self.requests[0] >= self.window_size:
self.requests.popleft()
def allow_request(self) -> bool:
"""
Check if a new request is allowed.
Returns:
bool: True if request is allowed, False if rate limit exceeded
"""
with self.lock:
self._clean_old_requests()
if len(self.requests) < self.max_requests:
self.requests.append(time.time())
return True
return False
if __name__ == "__main__":
# Create a sliding window log rate limiter
# with a maximum of 5 requests per second
# and a window size of 1 second
limiter = SlidingWindowLog(max_requests=5, window_size=1)
# Simulate 20 requests
for i in range(20):
if limiter.allow_request():
print(f"Request {i + 1}: Allowed")
else:
print(f"Request {i + 1}: Rejected")
time.sleep(0.1) # Sleep for 0.1 seconds between requests
Run the code:
python sliding_window_log_demo.py
Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Rejected
Request 7: Rejected
Request 8: Rejected
Request 9: Rejected
Request 10: Rejected
Request 11: Allowed
Request 12: Allowed
Request 13: Allowed
Request 14: Allowed
Request 15: Allowed
Request 16: Rejected
Request 17: Rejected
Request 18: Rejected
Request 19: Rejected
Request 20: Rejected
Pros Link to heading
- Most accurate rate limiting: Maintains exact counts of requests within any time window, providing precise rate limiting.
- No burst traffic allowed: The sliding window nature prevents any form of burst traffic, ensuring consistent load.
- Precise control over request timing: Exact timestamps allow for fine-grained control over request processing.
- Fair distribution of requests: The continuous sliding window ensures fair treatment of all requests regardless of timing.
- Good for high-accuracy requirements: Ideal for systems where precise rate limiting is critical.
Cons Link to heading
- Memory intensive: Storing all request timestamps can consume significant memory, especially under high load.
- Higher computational overhead: Processing and cleaning up timestamps requires more CPU resources.
- More complex implementation: Managing the timestamp log and window sliding adds complexity to the code.
- Not suitable for high-throughput systems: The memory and computational overhead make it less ideal for very high-volume systems.
- May need cleanup of old timestamps: Requires periodic cleanup of old timestamps to prevent memory growth.
5. Sliding Window Counter Link to heading
The sliding window counter algorithm:
- Maintains two counters: current window and previous window
- When a request arrives:
- Update window boundaries if needed
- Calculate weighted count from both windows
- If total is below limit, increment current window and allow
- If total is at limit, reject the request
- Uses weighted average to approximate sliding window
- Provides good accuracy with minimal memory usage
# sliding_window_counter_demo.py
import time
import threading
class SlidingWindowCounter:
def __init__(self, max_requests: int, window_size: int):
"""
Initialize a sliding window counter rate limiter.
Args:
max_requests: Maximum number of requests allowed in the window
window_size: Size of the sliding window in seconds
"""
self.max_requests = max_requests
self.window_size = window_size
self.current_window = 0
self.previous_window = 0
self.window_start = time.time()
self.lock = threading.Lock()
def _reset_window(self):
"""
Update window counters based on elapsed time.
This method is called before each request check.
"""
now = time.time()
if now - self.window_start >= self.window_size:
self.previous_window = self.current_window
self.current_window = 0
self.window_start = now
def allow_request(self) -> bool:
"""
Check if a new request is allowed using weighted counters.
Returns:
bool: True if request is allowed, False if rate limit exceeded
"""
with self.lock:
self._reset_window()
current_time = time.time()
window_progress = (current_time - self.window_start) / self.window_size
# Weighted count of previous window
weighted_count = self.previous_window * (1 - window_progress)
total_count = weighted_count + self.current_window
if total_count < self.max_requests:
self.current_window += 1
return True
return False
if __name__ == "__main__":
# Create a sliding window counter rate limiter
# with a maximum of 5 requests per second
# and a window size of 1 second
limiter = SlidingWindowCounter(max_requests=5, window_size=1)
# Simulate 20 requests
for i in range(20):
if limiter.allow_request():
print(f"Request {i + 1}: Allowed")
else:
print(f"Request {i + 1}: Rejected")
time.sleep(0.1) # Sleep for 0.1 seconds between requests
Run the code:
python sliding_window_counter_demo.py
Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Rejected
Request 7: Rejected
Request 8: Rejected
Request 9: Rejected
Request 10: Rejected
Request 11: Allowed
Request 12: Rejected
Request 13: Allowed
Request 14: Rejected
Request 15: Allowed
Request 16: Rejected
Request 17: Allowed
Request 18: Rejected
Request 19: Allowed
Request 20: Rejected
Pros Link to heading
- Memory efficient: Only needs two counters regardless of request volume, making it very memory efficient.
- More accurate than fixed window: The weighted average approach provides better accuracy than simple fixed windows.
- Good balance of accuracy and performance: Offers a practical compromise between precision and resource usage.
- Suitable for distributed systems: The simple counter-based approach works well in distributed environments.
- Handles burst traffic better than fixed window: The weighted average helps smooth out traffic better than fixed windows.
Cons Link to heading
- More complex than fixed window: The weighted average calculation adds complexity compared to simple fixed windows.
- Still less accurate than sliding window log: While more accurate than fixed window, it’s not as precise as the sliding window log.
- Requires careful tuning of window size: Finding the optimal window size requires consideration of the specific use case.
- May need synchronization in distributed systems: While suitable for distributed systems, it still requires proper synchronization mechanisms.
- More computational overhead than fixed window: The weighted average calculations add some computational overhead.