Load balancing is a widely used traffic resolving mechanism. It distributes the incoming requests in an efficient and even manner across multiple servers. In this blog, we will discuss how a software-based load balancer can be used in a high-traffic websites which accept millions of concurrent requests from different users while still returning fast and accurate data.
I will specifically focus on my team’s direct experience in implementing a redirector for one of eInfochips clients, a leading smart home automation player. The client has users who access the website in order to manage and operate smart home appliances. There are typically two types of load balancers: hardware-based and software based. Hardware-based load balancer refers to a machine with specialized processors provided by vendors which have software built into it. Whereas, the software-based solution runs on any commodity hardware. Here, we will mainly focus on the advantages and things to be taken care of in a software load balancer.
Advantages of software load balancing over hardware
- It is extremely cheap (Almost 75% cheaper than hardware based). Also, the cost does not increase as the traffic increase.
- It gives great performance. It is observed that TCP connections are not stable when load is extremely high in hardware based load balancer.
- The deployment is flexible and easy. Also, it could be deployed anywhere.
- The setup can be done much faster.
- It is much easier to configure and can be configured based on the application’s need
Client-side random load balancing
This refers to an approach where the list of server IPs is delivered to the client and then a random IP (based on any algorithm) is selected from the available list on which the connection is established. A Curator API provides this facility through which a server can register and/or deregister itself to Zookeeper. It then returns a list of available servers to the redirector. The redirector selects any random server and a connection is established to that server. Each server maintains the number of established connections so that it could deregister (remove its name from the list of available servers) itself if it reaches the number more that it could handle. This number is determined after performing load and stress of an application. This approach does not affect the already established connections. It just prevents the new connections until it is again capable of handling connections. With this approach, the first thing to do on a server start-up is to register itself and similarly deregister on shutdown. Although Zookeeper takes care of it automatically and deregisters a server if not available.
Server-side load balancing
The server side load balancers are the actual servers that serve the request. i.e. it listens to the port where the client is willing to establish a connection and access the website. This selected server does not serve all the incoming requests. It has rules configured on it that restricts overloaded connections. There are mainly two types of rules configurations. One, the maximum number of connections that can be established from different users at a given time.
Second, the maximum number of requests coming from a single client i.e. from same IP. By these configurations, we are able to reject the request directly (without entering the server) and not let the connection be established beyond the defined limit. This number is all configurable and determined as per the current applications actual traffic. This mechanism is often known as throttling.
SSL Offloading: The process of encryption and decryption of an SSL request is a major concern in case of a home automation application. Here, the SSL offloading (encryption/decryption mechanism) could be done at the distributed servers. The downside of this approach is that it is an overhead or we can say an extra load on the servers. It does not degrade the performance to an extent that an end user can notice it unless a server is overloaded.
Multiple servers register in Zookeeper, ensemble with self-identifying metadata (cluster id and IP address). Device makes call to redirector. Redirector uses rules in zookeeper to respond with the IP address of Server.
Terms and references
- Zookeeper: Zookeeper is software that provides registry of distributed servers.
- Curator API: It is a high-level API build on zookeeper that handles managing connections to zookeeper.
- Load and stress: A kind of testing performed to determine the maximum capacity of an application.
Also, check our previous blog on OAuth 2.0 framework which gives an indication of authorizing HTTP service for third-party applications.