Enterprise Java

Basic API Rate-Limiting

It is likely that you are developing some form of (web/RESTful) API, and in case it is publicly-facing (or even when it’s internal), you normally want to rate-limit it somehow. That is, to limit the number of requests performed over a period of time, in order to save resources and protect from abuse.

This can probably be achieved on web-server/load balancer level with some clever configurations, but usually you want the rate limiter to be client-specific (i.e. each client of your API sohuld have a separate rate limit), and the way the client is identified varies. It’s probably still possible to do it on the load balancer, but I think it makes sense to have it on the application level.

I’ll use spring-mvc for the example, but any web framework has a good way to plug an interceptor.

So here’s an example of a spring-mvc interceptor:

@Component
public class RateLimitingInterceptor extends HandlerInterceptorAdapter {

    private static final Logger logger = LoggerFactory.getLogger(RateLimitingInterceptor.class);
    
    @Value("${rate.limit.enabled}")
    private boolean enabled;
    
    @Value("${rate.limit.hourly.limit}")
    private int hourlyLimit;

    private Map<String, Optional<SimpleRateLimiter>> limiters = new ConcurrentHashMap<>();
    
    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler)
            throws Exception {
        if (!enabled) {
            return true;
        }
        String clientId = request.getHeader("Client-Id");
        // let non-API requests pass
        if (clientId == null) {
            return true;
        }
        SimpleRateLimiter rateLimiter = getRateLimiter(clientId);
        boolean allowRequest = limiter.tryAcquire();
    
        if (!allowRequest) {
            response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
        }
        response.addHeader("X-RateLimit-Limit", String.valueOf(hourlyLimit));
        return allowRequest;
    }
    
    private SimpleRateLimiter getRateLimiter(String clientId) {
        if (limiters.containsKey(clientId)) {
            return limiters.get(clientId);
        } else {
            synchronized(clientId.intern()) {
                // double-checked locking to avoid multiple-reinitializations
                if (limiters.containsKey(clientId)) {
                    return limiters.get(clientId);
                }
                
                SimpleRateLimiter rateLimiter = createRateLimiter(clientId);
                
                limiters.put(clientId, rateLimiter);
                return rateLimiter;
            }
        }
    }
	
	@PreDestroy
	public void destroy() {
		// loop and finalize all limiters
	}
}

This initializes rate-limiters per client on demand. Alternatively, on startup you could just loop through all registered API clients and create a rate limiter for each. In case the rate limiter doesn’t allow more requests (tryAcquire() returns false), then raturn “Too many requests” and abort the execution of the request (return “false” from the interceptor).

This sounds simple. But there are a few catches. You may wonder where the SimpleRateLimiter above is defined. We’ll get there, but first let’s see what options do we have for rate limiter implementations.

The most recommended one seems to be the guava RateLimiter. It has a straightforward factory method that gives you a rate limiter for a specified rate (permits per second). However, it doesn’t accomodate web APIs very well, as you can’t initilize the RateLimiter with pre-existing number of permits. That means a period of time should elapse before the limiter would allow requests. There’s another issue – if you have less than one permits per second (e.g. if your desired rate limit is “200 requests per hour”), you can pass a fraction (hourlyLimit / secondsInHour), but it still won’t work the way you expect it to, as internally there’s a “maxPermits” field that would cap the number of permits to much less than you want it to. Also, the rate limiter doesn’t allow bursts – you have exactly X permits per second, but you cannot spread them over a long period of time, e.g. have 5 requests in one second, and then no requests for the next few seconds. In fact, all of the above can be solved, but sadly, through hidden fields that you don’t have access to. Multiple feature requests exist for years now, but Guava just doesn’t update the rate limiter, making it much less applicable to API rate-limiting.

Using reflection, you can tweak the parameters and make the limiter work. However, it’s ugly, and it’s not guaranteed it will work as expected. I have shown here how to initialize a guava rate limiter with X permits per hour, with burstability and full initial permits. When I thought that would do, I saw that tryAcquire() has a synchronized(..) block. Will that mean all requests will wait for each other when simply checking whether allowed to make a request? That would be horrible.

So in fact the guava RateLimiter is not meant for (web) API rate-limiting. Maybe keeping it feature-poor is Guava’s way for discouraging people from misusing it?

That’s why I decided to implement something simple myself, based on a Java Semaphore. Here’s the naive implementation:

public class SimpleRateLimiter {
    private Semaphore semaphore;
    private int maxPermits;
    private TimeUnit timePeriod;
    private ScheduledExecutorService scheduler;

    public static SimpleRateLimiter create(int permits, TimeUnit timePeriod) {
        SimpleRateLimiter limiter = new SimpleRateLimiter(permits, timePeriod);
        limiter.schedulePermitReplenishment();
        return limiter;
    }

    private SimpleRateLimiter(int permits, TimeUnit timePeriod) {
        this.semaphore = new Semaphore(permits);
        this.maxPermits = permits;
        this.timePeriod = timePeriod;
    }

    public boolean tryAcquire() {
        return semaphore.tryAcquire();
    }

    public void stop() {
        scheduler.shutdownNow();
    }

    public void schedulePermitReplenishment() {
        scheduler = Executors.newScheduledThreadPool(1);
        scheduler.schedule(() -> {
            semaphore.release(maxPermits - semaphore.availablePermits());
        }, 1, timePeriod);

    }
}

It takes a number of permits (allowed number of requests) and a time period. The time period is “1 X”, where X can be second/minute/hour/daily – depending on how you want your limit to be configured – per second, per minute, hourly, daily. Every 1 X a scheduler replenishes the acquired permits. There is no control for bursts (a client can spend all permits with a rapid succession of requests), there is no warm-up functionality, there is no gradual replenishment. Depending on what you want, this may not be ideal, but that’s just a basic rate limiter that is thread-safe and doesn’t have any blocking. I wrote a unit test to confirm that the limiter behaves properly, and also ran performance tests against a local application to make sure the limit is obeyed. So far it seems to be working.

Are there alternatives? Well, yes – there are libraries like RateLimitJ that uses Redis to implement rate-limiting. That would mean, however, that you need to setup and run Redis. Which seems like an overhead for “simply” having rate-limiting.

On the other hand, how would rate-limiting work properly in a cluster of application nodes? Application nodes probably need some database or gossip protocol to share data about the per-client permits (requests) remaining? Not necessarily. A very simple approach to this issue would be to assume that the load balancer distributes the load equally among your nodes. That way you would just have to set the limit on each node to be equal to the total limit divided by the number of nodes. It won’t be exact, but you rarely need it to be – allowing 5-10 more requests won’t kill your application, allowing 5-10 less won’t be dramatic for the users.

That, however, would mean that you have to know the number of application nodes. If you employ auto-scaling (e.g. in AWS), the number of nodes may change depending on the load. If that is the case, instead of configuring a hard-coded number of permits, the replenishing scheduled job can calculate the “maxPermits” on the fly, by calling an AWS (or other cloud-provider) API to obtain the number of nodes in the current auto-scaling group. That would still be simpler than supporting a redis deployment just for that.

Overall, I’m surprised there isn’t a “canonical” way to implement rate-limiting (in Java). Maybe the need for rate-limiting is not as common as it may seem. Or it’s implemented manually – by temporarily banning API clients that use “too much resources”.

Reference: Basic API Rate-Limiting from our JCG partner Bozhidar Bozhanov at the Bozho’s tech blog blog.

Bozhidar Bozhanov

Senior Java developer, one of the top stackoverflow users, fluent with Java and Java technology stacks - Spring, JPA, JavaEE, as well as Android, Scala and any framework you throw at him. creator of Computoser - an algorithmic music composer. Worked on telecom projects, e-government and large-scale online recruitment and navigation platforms.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Carl J. Mosca
6 years ago

I would think this would be easier to implement and maintain if it were done via an API Gateway (such as Axway, Kong, etc.) assuming you’re on premise or maintaining your own infrastructure.

When I look at cloud providers, such as AWS, they also offer such functionality.

Why do you feel it’s better to code this?

MChisty
MChisty
1 year ago
Reply to  Carl J. Mosca

Taking advantage of existing cloud providers like AWS is more of a admin level job; it is just like learning how to use a tool i.e. configure this and that etc. (The network people, call center operators does same kind of stuff using some dashboard). But as a developer, doing something programmatically helps understanding in depth. A developer should know how an app works behind the scene.

Bozho
6 years ago

Because you have more fine-grained control. At the infrastructure level you can’t have per-client variations of the rate limit

Back to top button