System design - Rate limiter

Contents

Rate Limiter

a rate limiter is used to control the rate of traffic sent by a client or a service.

why

  • prevent Dos attach to prevent prevent server overload
  • reduce cost if using third party api, like retireve balance

implement

  • client side: throttle the button click rate, no guarentee (can be modified source code)
  • server side: most cases implemented within API Gateway

  • hard limit or soft limit

Algorithms

  • Token Bucket
  • Leaking Bucket
  • Fixed Window Counter
  • sliding window log
  • sliding window counter

token bucket: A token bucket is a container that has pre-defined capacity. Tokens are put in the bucket at preset rates periodically. Once the bucket is full, no more tokens are added.

  • If there is enough token, process request and consume one token
  • else: request is denied
  • token bucket

  • TokenBucket(bucketSize, RefillRate)
    • Refill rate: number of tokens put into the bucket every second
  • How many bucket is needed depends on different api, if a user is allowed to make 1 post per second, add 150 friends per day, the we need 2 buckets
Pros:
    1. easy to implement
    2. memory efficient
Cons:
    1. hard to tune the params properly

Leaking bucket: FIFO queue - if queue is not full, add it to queue - else drop it - Requests are pulled from the queue and processed at regular intervals.

  • leakingBucket(BucketSize, OutflowRate):
    • Outflow rate: it defines how many requests can be processed at a fixed rate, usually in seconds.
Pros:
    1. memory efficient given the limited queue size
    2. Requests are processed at a fixed rate therefore it is suitable for use cases that a stable outflow rate is needed.
Cons:
    1. when queue filled with old requests, burst recent traffic will be dropped
    2. hard to tune params

Fixed window counter - The algorithm divides the timeline into fix-sized time windows and assign a counter for each window - Each request increments the counter by one - Once the counter reaches the pre-defined threshold, new requests are dropped until a new time window starts.

issue: [0.5 : 1] + [1, 1.5] this one second can process 2 * count requests

Pros:
    1. memory efficient
    2. fit certain cases
Cons:
    issue: [0.5 : 1] + [1, 1.5] this one second can process 2 * count requests

sliding window log : fix the issue of fixed window counter - keep track request timestamp in cache (redis) - when new request came, remove all outdated timestamp, eg request comes at 1:01:30, so the frame is [1:00:30 - 1:01:30], remove all timestamp before 1:00:30 - add timestamp to log - if log size <= allowed count, process else rejected

Pros:
    1. accurate rate limit, in any rolling window, request will not exceed limit
Cons:
    1. need memory to store request timestamp

sliding window counter: combine fixed window counter and sliding window log - Requests in current window + requests in the previous window * overlap percentage of the rolling window and previous window - round up or down, then compare with allowed count

Pros:
    1.  It smooths out spikes in traffic because the rate is based on the average rate of the previous window
    2. memory efficient
Cons:
    1. not so strict

implementation Code with Redis

High level architure

in-memory cache like Redis is a good choice, as this is time-based expiration strategy.

high level

Deep Dive

rules:

domain: auth
descriptors:
  - key: auth_type
    Value: login
    rate_limit:
      unit: minute
      requests_per_unit: 5

rules often stored on configs or saved on disk. generally rules will be cached as well.

Headers: client can get information from headers about the rate limit.

  • X-Ratelimit-Remaining: The remaining number of allowed requests within the window.
  • X-Ratelimit-Limit: It indicates how many calls the client can make per time window.
  • X-Ratelimit-Retry-After: The number of seconds to wait until you can make a request again without being throttled.

When a user has sent too many requests, a 429 too many requests error and X-Ratelimit-Retry-After header are returned to the client.

Rate limiter in distributed systems

In distributed systems, generally it will have two problems:

  • Race condition:
    • Lua script
    • sorted set
  • Sync problem:
    • same client to same limiter (not scalable)
    • better solution:
      • use centralized data stores like Redis, all rate limiter fetch data from Redis

performance

  • set up multi-data center to reduce latency
  • sync data in eventual consistency model (trade strict limit for latency)

metric & monitor

  • limit algorithm is effective
  • limit rules are effective

too many valid request are dropped -> relax rule flash sales, burst traffic -> change to token buckets

Improvement with Bloom Filters & Count-Min Sketches

Reference

Use Bloom Filter to track requests and estimate whether a user has made requests in the current time window.

Use Count-min Sketches to count the number of requests for each user, ensuring a precise count within the time window

Bloom Filter Class Implementation

import java.util.BitSet;

public class BloomFilter {
    private BitSet bitSet;
    private int size;
    private int[] hashSeeds; // List of seeds for different hash functions

    public BloomFilter(int size, int[] hashSeeds) {
        this.size = size;
        this.bitSet = new BitSet(size);
        this.hashSeeds = hashSeeds;
    }

    // Insert an item (e.g., User ID) into the Bloom Filter
    public void add(String item) {
        for (int seed : hashSeeds) {
            int hash = hash(item, seed);
            bitSet.set(Math.abs(hash % size));
        }
    }

    // Check if an item exists in the Bloom Filter
    public boolean mightContain(String item) {
        for (int seed : hashSeeds) {
            int hash = hash(item, seed);
            if (!bitSet.get(Math.abs(hash % size))) {
                return false; // If any bit is not set, item is definitely not present
            }
        }
        return true; // Otherwise, item might be present
    }

    // Hashing function
    private int hash(String item, int seed) {
        return item.hashCode() * seed;
    }

    // Reset the Bloom filter for a new time window
    public void reset() {
        bitSet.clear();
    }
}

Count-min Sketches Implementation

import java.util.Random;

public class CountMinSketch {
    private int[][] sketch;
    private int depth;
    private int width;
    private int[] hashSeeds;

    public CountMinSketch(int depth, int width) {
        this.depth = depth;
        this.width = width;
        this.sketch = new int[depth][width];
        this.hashSeeds = new int[depth];

        // Initialize hash seeds for each row
        Random random = new Random();
        for (int i = 0; i < depth; i++) {
            hashSeeds[i] = random.nextInt();
        }
    }

    // Add an item and increment its count
    public void add(String item) {
        for (int i = 0; i < depth; i++) {
            int hash = hash(item, hashSeeds[i]);
            sketch[i][hash % width] += 1;
        }
    }

    // Estimate the count of an item
    public int estimateCount(String item) {
        int min = Integer.MAX_VALUE;
        for (int i = 0; i < depth; i++) {
            int hash = hash(item, hashSeeds[i]);
            min = Math.min(min, sketch[i][hash % width]);
        }
        return min;
    }

    // Hashing function
    private int hash(String item, int seed) {
        return item.hashCode() * seed;
    }

    // Reset Count-Min Sketch for a new time window
    public void reset() {
        for (int i = 0; i < depth; i++) {
            for (int j = 0; j < width; j++) {
                sketch[i][j] = 0;
            }
        }
    }
}

Rate Limiter Class Implementation

public class RateLimiter {
    private BloomFilter bloomFilter;
    private CountMinSketch countMinSketch;
    private int maxRequests; // Max allowed requests per user/IP per time window

    public RateLimiter(int bloomFilterSize, int[] hashSeeds, int depth, int width, int maxRequests) {
        this.bloomFilter = new BloomFilter(bloomFilterSize, hashSeeds);
        this.countMinSketch = new CountMinSketch(depth, width);
        this.maxRequests = maxRequests;
    }

    // Process a request from a user and check if it's within the rate limit
    public boolean isRequestAllowed(String userId) {
        if (bloomFilter.mightContain(userId)) {
            // Check exact count in Count-Min Sketch
            int currentCount = countMinSketch.estimateCount(userId);
            if (currentCount >= maxRequests) {
                return false; // Exceeds rate limit
            }
        }

        // If not exceeded, add the request
        bloomFilter.add(userId);
        countMinSketch.add(userId);
        return true; // Request allowed
    }

    // Reset for new time window
    public void reset() {
        bloomFilter.reset();
        countMinSketch.reset();
    }
}

Window Manager Class Implementation

import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

public class WindowManager {
    private RateLimiter rateLimiter;
    private int windowDuration; // Duration in seconds for each sliding window

    public WindowManager(RateLimiter rateLimiter, int windowDuration) {
        this.rateLimiter = rateLimiter;
        this.windowDuration = windowDuration;
    }

    public void startWindowRotation() {
        ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
        
        // Schedule BloomFilter and CountMinSketch reset at regular intervals
        scheduler.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                rateLimiter.reset();
            }
        }, windowDuration, windowDuration, TimeUnit.SECONDS);
    }
}

Contents