Where to store system state? – this is one of the key questions to answer when designing solutions that are based on the cloud. Google Cloud Platform (GCP) offers excellent options to architect a solution than handles state elegantly.
Let’s get a key principle out of the way – ‘Stateless is desired as much as possible!’
So what’s Stateful vs Stateless? State is any action that depends on the memory of the preceding action and the data that must be remembered is called state information. Let’s take an analogy:
Let’s say you like burrito bowls very much and are preparing it in your kitchen – you need to plan and cook everything you need including the type of rice, type of beans, meat you need and the salsa, veggies and guacamole you prefer. Then you start by preparing them in the right order step-by-step. This is state where you remember the right order and prepare the dish accordingly. This gives you a lot of control, but on the flip side its very hard to scale and you become a single point of failure.
Compare this with what happens in a Chipotle store, there are a few stations/people for the rice, beans, meat, veggies and guacamole. This allows you to scale and serve a lot of people and trouble shoot a single area if needed and get very good at it. Think assembly line…
So you probably are asking why is all this important or related to building solutions on the cloud?
It all boils down to – where is the truth in the system? Whenever your data is centrally located and is retrieved, for example – customer data or an active session in a eCommerce site – we cannot afford to lose this data. If there is an issue with the data, how do we get to the truth? So a central control mechanism is also a choke point and the critical decision becomes where we store the stateful information. Sources of state are objects in a database, session information in memory, key-value pairs for users like cookies.
The solution is to make it as distributed as possible and quite frankly the best state is no state. In the stateless architecture, you can apply more resources to a problem, you can relocate tasks as needed, make it fault tolerant as there is less resources to recover at any time.
The worst thing than can happen is a ‘hotspot’. Let’s say you have multiple front end servers that has state information and in an unfortunate situation in spite of having a load balancer, all the requests route to the same front end.
To handle this we push state to a backend and not just one backend service but distribute it to a number of backend state servers with replication. While it may increase latency, it still solves the more serious problem of state and failure.
This way, we are going from a single server – 100% unavailability to multiple machines with redundant state and full recovery.
In summary – we can a build a scalable architecture that meets most scenarios.