Last updated
Last updated
Single machine vs cluster-based rate limiting.
Account-based rate limiting: VIP account vs general account.
Resource-based rate limiting: Specific IP address
Directly return service unavailable 429.
Turn synchronous requests to asynchrous handling.
Synchronously block until available.
Adjust load balancing mechanism.
The following response headers could be referenced.
Rate limit threshold is also related to the size of request.
If one machine handles all request with big payload, the other handles requests will small payload. Then the threshold should be adjusted accordingly.
Watch the peak time QPS
Leave additional 20% capacity. Divide by machine number if needed.
After pressure test, you could get some charts as follow:
A: If want best resource utilization
B: If want best throughput, it is the tipping point where the system will crash.
C: If want best response time
If a service A has a certain conversion rate to service B, then we could reference the rate limiting number for A when deciding on B.
The worst case scenario.
Using the example of lyft envoy: