- Low Web latency Application ( meaning low Page Loading times)
- Application that can serve ever increasing number of users (scalability)
- Application that does not go down (either highly available or continuously available)
For each of the above, as an Architect you need to dig deeper to find out what the user is asking for. With the advent of cloud, every CIO is looking to build applications that meet all of the above scenarios. With the advent of elastic compute, one tends to think that by throwing hardware to the application, we may be able to achieve all of the above objectives.
The techniques employed to achieve the above scenarios at times are different and it is important to find the right approach to the solution that meets the above objectives. We will examine some of the common techniques that can help us achieve the objectives
Application Tier ing – One of the biggest contributors to the latency is the application tier ing. The hops from WebServer -> Application Server -> Database and back, data serialization/deserialization are some of the biggest contributor to the overall latency. Having Web and Application tier within the same box or even within same JVM can help reduce the network latency factor. One can have logical separation in the application code between Web Tier and Application Tier but need not have physical separation. Using Spring Container that has Web/App tier can help achieve the same. If the application is making use of SOA and making multiple web services or JMS message calls, network latency and serialization of data once again adds to the latency. Solutions like IBM Datapower XML Accelerators can be used to reduce the XML overheads. Similarly, the application can use Solace Message Router’s to speed up the messaging.
Disk I/O – Another weak link in the application performance chain is Disk I/O. One way to overcome the limitations with regards to the Disk I/O is too keep data in memory. In Memory databases (like Volt DB or Solid DB or Oracle TimesTen), XTP solutions (like Oracle coherence, IBM eXtreme Scale, GigaSpaces eXtreme Application Platform) can used to speed up the application performance.
Optimized Hardware – The hardware on which application is hosted can also be tuned to reduce latency. Optimization s like 10G/20G network, fiber channels, low latency switches, SSD (Solid State Drives), not using virtualization can make sure the application latency is reduced.
Transport Mechanism – At times, the transport mechanism can also add to the application latency. E.g. secure communication (like https) can add to the latency with the additional overhead of deciphering the data at the receiving end. One way is to offload the SSL at the Load Balancer/Firewall.
In the end, you need to measure anything and everything to address the bottlenecks. Once the obvious bottlenecks have been addressed, one can start looking at things like – cache thrashing, poor algorithms, data bloating, wrong dimensioning etc to squeeze out that ounce of performance. All the techniques mentioned may not be applicable in all scenarios’, the architect needs to take a call based on the latency requirements.
Scalability means ability of an application to handle growing amount of data and concurrency in an efficient manner without impacting performance. Important thing to notice is scalability should not be at the cost of application performance. Some of the techniques that can help scale the application
Stateless Application/Service – The application should store its state in some centralized repository, but the application itself should be stateless. It means no storing of data or state on local file systems. Stateless application allows one to add any number of application instances to accommodate the increasing growth. But soon, the centralized repository starts becoming the bottleneck. With ever increasing data, repositories like (RDBMS) may start buckling down. One approach to this issue is to minimize mutable state in the database. To handle such scenarios, techniques like data sharding need to be applied. Another approach to managing write contention in the database is to look at the possibility of using NoSQL data stores for some or all of the application data.
Load Balancing – As the traffic starts going up, the application can handle the additional load by adding additional server instances to service the requests. The load balancer will make sure none of the servers are working beyond their stated load and new instance should be automatically added as and when the load goes up (auto scaling). One can also add load balance to database with techniques like Master-Master topology or Master-Slave(with partitioning read and write data) to handle the additional load. But if the data is going in Petabytes ranges, data sharding with data replication techniques need to be used. The in-memory data grid architecture can also be utilized to scale the data.
Fault Tolerance / Dynamic Discoverable Elements – When dealing with application that is running in large clusters, it is very important to avoid manual interventions. E.g. when the application load reaches a defined load, the application monitoring should be able to add a new instance and load balancer should be able to recognize the same to utilize it. Similarly, when data gets shard, the applications should be able to recognize and look up the new IP to connect. Similarly, if the application is not able to connect to particular resource, the application should be intelligent enough to recognize the fault and try accessing the alternate resource availability. The application will need to have a central meta data repository for all such fault tolerance scenarios that can be tapped by the application.
Availability of an application is very much a function of scalability. Following factors have an impact on the application availability
Redundancy – The application needs to be scalable to be able to compensate for the loss of any instance (whether hardware or software). The redundancy needs to be build at all layers, Software, Hardware, Power and even at data center levels. Even if the data center goes, the user should be able to access the application. Many at times, the level of redundancy and down time is a factor of how money is being thrown at the solution. Remember some problems have no solutions within the context of today’s technology. E.g. real time data mirroring or data sync across data centers that are located geographically apart.
Fault Tolerance – The application needs to be fault tolerant (e.g. retry mechanism) to make sure it can take advantage of dynamically allocated resources to keep functioning.
Monitoring/Testing – Another overlooked factor of application availability is application monitoring. If application is not properly monitored, outages can go undetected leading to application unavailability. Ability to monitor the entire application stack and take corrective actions is very important. This capability is build over a period of time. Once the application has monitoring, auto-scaling features, testing to make sure they work is also important. Something like Chaos Monkey used by Netflix is very helpful.
Configuration Data – Any application that needs to be continuously available needs to be able to run using configuration. E.g. if the application introduces the new service interface, the application should have the ability to either make use of the new interface or keep using the old one. This factor becomes very important when rolling out new features/services and all of them cannot be rolled out at once.