API gateway
Overall flowchart
ββββββββββββββββββββ
β Client β
ββββββββββββββββββββ
β
βΌ
Step 2. ββββββββββββββββββββ
βββββββββββββββββββββWatch ββββββββββββββββββββββββββββββββββ Gateway β
β changes ββββββββββββββββββββ
β β
β ββββββββββββββ β
β βControl β Step 5. Command to restart β
β βcenter ββββββββββββbusiness logic 1βββββββββββββββββββββ€
β βservice β β
β ββββββββββββββ β
βΌ β Step3. ββββββββββββββββββββ
βββββββββββββββ β ββββββββββββββEstablish ββββββ€ β
β Service β β β Long β β
βRegistration β β β β β
βββββββββββββββ β β β β
β² β β β β
β Step 6: Restart β β β
β business logic β βΌ βΌ
β unit 1 ββββββββββββββββββββΌβββββββββββββββββ βββββββββββββββββββ βββββββββββββββ
β β β βΌ β β β β β
Step 1. register β β ββββββββββββββββββββββββββββββ β β β β β
IP:Port and β β βThread for business logic β β β β β β
establish a β β β β β β β β β
connection for β β β Step 4. Agent/Process β β β β β β
heartbeat β β β for business logic dies β β β β β β
β β β β for some reason β β β β β β
β β β ββββββββββββββββββββββββββββββ β β β β β
β β β β β β β β
β β β ββββββββββββββββββββββββββββββ β β β β β
βββββββββββββββββΌββββββΌββββAgent for heartbeat β β β β β β
β β ββββββββββββββββββββββββββββββ β β β β β
β β β β β β β
β β ββββββββββββββββββββββββββββββ β β Business logic β β Business β
β β βAgent for restart β β β unit ... β βlogic unit n β
β β βa). Kill agent for heartbeatβ β βββββββββββββββββββ βββββββββββββββ
β β βb). Sleep long enough to β β β
βββββββΌβββΆβwait removal of the entry β β β
β βwithin service registration β β βΌ
β βc). Restart the unit β β βββββββββββββββββββ
β ββββββββββββββββββββββββββββββ β βData access layerβ
β β β β
β β βββββββββββββββββββ
β β β
β Business logic unit 1 β β
β β βΌ
β β βββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββ β Database β
β β
βββββββββββββββββββGateway architecture
Revolution history
Initial architecture
Only need to support web browser

BFF (Backend for frontEnd) layer
BFF layer exists to perform the following:
Security logic: If internal services are directly exposed on the web, there will be security risks. BFF layer could hide these internal services
Aggregation/Filter logic: Wireless service will typically need to perform filter (e.g. Cutting images due to the device size) / fit (client's customized requirements). BFF layer could perform these operations
However, BFF contains both business and cross-cutting logic over time.

Gateway layer and Cluster BFF Layer
BFF contains too many cross-cutting logic such as
Rate limiting
Auth
Monitor
Gateway is introduced to deal with these cross cutting concerns.

Clustered BFF and Gateway layer
Cluster implementation is introduced to remove single point of failure.

Gateway vs reverse proxy
Web Age: Reverse proxy (e.g. HA Proxy/Nginx) has existed since the web age
However, in microservice age, quick iteration requires dynamic configuration
MicroService Age: Gateway is introduce to support dynamic configuration
However, in cloud native age, gateway also needs to support dynamic programming such as green-blue deployment
Cloud native Age: Service mesh and envoy are proposed because of this.

Reverse Proxy (Nginx)
Use cases
Use distributed cache while skipping application servers: Use Lua scripts on top of Nginx so Redis could be directly served from Nginx instead of from web app (Java service applications whose optimization will be complicated such as JVM/multithreading)
Provides high availability for backend services
Failover config: proxy_next_upstream. Failure type could be customized, such as Http status code 5XX, 4XX, ...
Avoid failover avalanche config: proxy_next_upstream_tries limit number. Number of times to fail over
Gateway internals
API Gateway has become a pattern: https://freecontent.manning.com/the-api-gateway-pattern/
Please see this comparison (in Chinese)

Gateway comparison

Service discovery
Approach - Hardcode service provider addresses
Pros:
Update will be much faster
Cons:
Load balancer is easy to become the single point of failure
Load balancing strategy is inflexible in microservice scenarios. TODO: Details to be added.
All traffic volume needs to pass through load balancer, results in some performance cost.
ββββββββββββββββββ
β DNS Server β
ββββββββββββββββββββββββββΆβ β ββββββββββββββββββββββ
β ββββββββββββββββββ β Service provider 1 β
β ββββββββΆβ β
β β ββββββββββββββββββββββ
β β
β ββββββββββββββββββββββββββββββ β
ββββββββββββββββββ βLoad balancer β β ββββββββββββββββββββββ
βService consumerβ β β β βService provider ...β
β ββββββΆβservice provider 1 address ββββββββΌβββββββΆβ β
ββββββββββββββββββ βservice provider ... addressβ β ββββββββββββββββββββββ
βservice provider N address β β
ββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββ
β β Service provider N β
ββββββββΆβ β
ββββββββββββββββββββββApproach - Service registration center
Pros:
No single point of failure.
No additional hop for load balancing
For details on service registration implementation, please refer to [Service registration center]((https://github.com/DreamOfTheRedChamber/system-design/blob/master/serviceRegistry.md))
How to detect failure
Heatbeat messages: Tcp connect, HTTP, HTTPS
Detecting failure should not only rely on the heartbeat msg, but also include the application's health. There is a chance that the node is still sending heartbeat msg but application is not responding for some reason. (Psedo-dead)
Detect failure
centralized and decentralized failure detecting: https://time.geekbang.org/column/article/165314
heartbeat mechanism: https://time.geekbang.org/column/article/175545
How to gracefully shutdown
Problem: Two RPC calls are involved in the process
Service provider notifies registration center about offline plan for certain nodes
Registration center notifies clients to remove certain nodes clients' copy of service registration list
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Within the Shutdown Hook β
β (e.g. Java's Runtime.addShutdownHook method) β
β ββββββββββββββββββββ β
β βFor requests whichβ β
β βββββΆ βhappens before βββββββββ β
β β βflag is turned on,β β βββββββββββββββββββββ
β ββββββββββββββββββββ β ββββββββββββββββββββ β βClose the machine ββ
β βTurn on the β β βββββββββΆβ ββ
β βshutdown flag uponβββββ΄β β β ββ
β βhook is triggered β β β βββββββββββββββββββββ
β ββββββββββββββββββββ β ββββββββββββββββββββ β β
β β βFor new request, β β β
β βββββΆβnotify the caller βββββββββ β
β βabout the closure β β
β ββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββHow to gracefully start
Problem: If a service provider node receives large volume of traffic without prewarm, it is easy to cause failures. How to make sure a newly started node won't receive large volume of traffic?
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Within the start Hook β
β β
β βββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββ βββββββββββ β
β βββββββββββ βRegister the node info and β β β β β β
β β Service β β start time within β β Adaptive load balancer β β β β
β βprovider β β registration center β β based on the start time β βFinished β β
β β node βββββΆβ βββββΆβ βββββΆβpre-warm β β
β β starts β β Service: addToCart β β +10% weight every certain β β β β
β β β β Address: 192.168.1.2:9080 β β period β β β β
β βββββββββββ βStartTime: 02172020-11:34pmβ β β β β β
β βββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββ βββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββFuture readings
Last updated
Was this helpful?