Adaptive Microservices
Gary Leeson. 2020/12/10
“She can’nae take anymore, Captain!!” shouts Scotty as he enters my office. “The system is overloaded because of the number of new customers sales have managed to make this quarter” — “I know” I reply, “Datadog just alerted me via pager-duty — but there is little we can do at the moment, were scaled out to the limit we are allowed and we will have to ask management for more resources”.
Engineers in a micro-service-based eco-system have all experienced the above scenario more than once. It’s not pleasant when you are the one in the eye of the storm — but what can we do to help ameliorate the scenario.
— -
What is an Adaptive System
According to Wikipedia an adaptive system is a ‘set of interacting or interdependent entities, real or abstract, forming an integrated whole that together is able to respond to environmental changes or changes in the interacting parts’. From a micro-service architecture, this can mean that the application — a collection of web services and apps — or the individual service themselves react and adapt to changes in the environment.
What we are NOT referring to here is what the application is actually designed to do — user management or hotel room booking for example. What we ARE referring to is adapting to changes based upon CPU load, Disc usage, network usage, service availability amongst others.
Adaptive system behavior can occur at two main levels: the macro and the micro-level. Macro-level adaptation occurs at the system and application level (multi-service or multi-service-instance). At this level, if you are using Docker and Kubernetes (who isn’t these days) where an individual service is at ‘stress’ scaled-out automatically. Kubernetes is excellent at this type of systems level monitoring and instance management.
Micro-level adaptation occurs within an individual micro-service where sensors hook into host information are linked into the service (a bit like an organ or cell within an animal) itself. Common ‘sensors’ are for CPU load, disk usage/availability, and network bandwidth/usage for example; but you could also have ‘sensors’ that access information/data from a central management system or dashboard for more broad level effects.
An obvious question is what do we do with this ability to sense our environment at the micro-level? — what does it give us the humble developer or company? Those questions take a bit more consideration and are highly dependent on your application and deployment model. A good start is to develop a number of stress scenarios in conjunction with the Quality, development, and systems architecture teams. You need to have plans of what you need to do and how to react when the tornado happens — and it WILL happen — and what other downstream services your service interacts with. Often this plan comes down to making sure you have provisioned enough instances (in the right geographic locations) and allocated enough disc storage etc. From an adaptive microservice perspective, a good question to ask is ‘is it possible for an individual service to help ameliorate things when it itself is ‘stressed’?
Let us take a step back and think about web services. In an ideal world, web service does only one thing and has no downstream dependencies — but most web services usually have downstream dependencies; This could be a database, Kafka/JMS queue, or calls to other web services. What happens when one of these services goes down or is stressed? This is where often you implement a ’circuit breaker pattern’ using something like hystrix and this is an example of micro-level adaptive behavior which reacts to network-related problems. Other possible behaviors are:
1. Dynamically change the amount of logging done based on system or disc load. Less logging when the system is under stress, more when the load is light.
2. Not all downstream dependencies/calls are actually necessary (though might be nice to have). Stop doing (or postpone) ones that are not necessary when the system is under stress. This obviously has to be understood and agreed upon by all the stakeholders.
3. Start rejecting calls when things get stressed. This could be all or a variable percentage of the calls. You could link this with a load balancer which redirects calls to another instance if the original target service returns an overload error.
There are many things that are possible and appropriate for the system and user. A good starting place is to go through a service line by line and call by call and ask:
1. Is this call necessary at all — if not you might be able to remove it completely.
2. If this call is necessary … is it necessary when the system is under stress or overload? Could it be dropped or postponed (an example is a scheduled task or an ‘informational’ call to another system)?
It might help if you define a number of scenarios: ‘normal’, ‘elevated’, and ‘overload’ usage. The answers can then be distilled into a strategy that can be implemented.
Let's take and develop a simple — if contrived — web service and see where this leads us.
— -
A Simple Example of a Local Adaptive System
We have a simple web service that reacts to rest API call makes a down-stream or complicated calculation, logs some stuff, and then makes an ‘informational FYI’ down-stream call to another system before returning. For simplicity's sake, the ‘essential’ complicated call will always take 200ms and the ‘informational’ call will always take 50ms; we will count the time each step takes.
The full maven java project can be found on github here.
The ‘basic’ non-adaptive endpoint is covered by this code snapshot:
Going through this we define based on the CPU load where the service instance is running the following three scenarios:
1. NORMAL When the CPU load is 50% or less — and we will do all the logging and make all the ‘informational’ downstream calls.
2. WARNING When the CPU load is between 50% and 80% — and we will drop most of the logging but still keep the ‘informational’ downstream calls.
3. DANGEROUS When the CPU load is 80% or more — in this scenario we will stop logging as well as dropping the ‘informational’ downstream calls.
This adaptive version of the original call can (in a crude way) be defined by the following code snapshot:
We then built and deployed the application (the adaptive and non-adaptive versions are deployed as different rest endpoints in the same service). The testing methodology is simple: Call each endpoint under different load scenarios (number of concurrent callers calling as fast as they can) for a minute and track the call RPS and CPU usage. The Bombardier load tool would provide the stress and yield the throughput in requests-per-second and the service itself has a background thread that dumps the CPU load and the number of calls at each step. Each load test was run for 60 seconds with gaps between tests to allow all CPUs to return back to a normal low base level.
— -
Test Results
So with two clients, the basic endpoint can process 8 requests/second and the adaptive one the same.
The table and graph show pretty much the same behavior until the system load reaches 80% whereupon the adaptive endpoint stops making the ‘informational’ call. This free’s up CPU cycles for the adaptive endpoint to use to process more of the requests coming in AND still do more of the ‘important’ stuff.
Naturally a service's level of ability to adapt and ‘mileage for applicability’ will vary from service to service and domain to domain — but it is something to think about.