An 8-step process to architect solutions on the Google Cloud Platform (GCP)


As we know architecting and designing a scalable, cost effective and transformative business solution on the amazing Google Cloud Platform (GCP) requires a lot of thought and knowledge. There are many things to consider and we need a comprehensive approach and detailed process to ensure that our solution is robust and can meet expectations.

In a series of blog posts, we will attempt to deconstruct the GCP process in detail.

Here is the overview of the 8-step process that highlights all the considerations for a successful GCP solution.

8-Step process to architect a solution of the Google Cloud Platform (GCP)
  • Defining the service
    • Qualitative – why do we need solution, what problem does it solve, who are the stakeholders, when do users need/want solution            
    • Quantitative – uptime & downtime needs, calculate cost of data-volume and data-throughput required, number of and location of users
    • Scaling – are there scaling needs right off the bat? when to iterate design, limiting factors as the business grows
    • Size – dimensions, replication and rate of change
    • High level SLA’s required
  • Business logic layer design
    • Business rules for the solutions
    • How data will be created, stored and changed
    • Microservice architecture if needed
    • 12 factor design
    • Vertical scaling or horizontal scaling
    • Design first, dimension later
  • Data layer design
    • Data persistence mechanisms (db service and storage service)
    • Data access layer that encapsulates persistence mechanisms and exposes data
    • You can only pick 2 out of these 3 for data design – Consistency, Availability, Partition/Tolerance
    • Considerations – uptime, latency, scale, velocity and privacy
    • Data migration/transfer
    • Choosing between Regional, multi-regional, nearline and coldline storage options
  • Presentation layer design
    • Network configuration for data transfer within the service (location, load balancing, caching)
    • Cloud CDN – network edge points
    • Multi-cloud with dedicated interconnect if needed
    • Hybrid Cloud network design
    • VPN configurations for reliability and aggregate capacity/bandwidth
  • Design for resiliency, scalability and DR
    • Failure due to loss of resources needed for service
    • Single points of failure – replicate everything, N+2 always
    • Correlated failures – decouple service, use microservices
    • Failure due to overload
    • Cloud DNS
    • Tiered backup for resiliency
    • Resiliency – Health checks, instance plan, storage plan, network plan
    • DR – Multi-region – app state info, db backup, deployment manager templates (recreate infra)
  • Design for security
    • Transparency and segregation of duties – what google does and what customer does
    • Firewall security – first wall of defense
    • Secure VPC (distributed firewall, cloud IAM, bastion hosts, isolation through public ip’s)
    • Cloud Interconnect (direct peering, carrier interconnect, vpn, private interconnect)
    • 3rd party virtual appliances – next-gen firewall, logging, monitoring, compliance
    • Global load balancer – built-in ddos protection, autoscaling, cross-region overflow/failover, SSL termination
    • Google network – global network, protection against udp-based attacks
    • VM’s with external IP addresses, without (bastion hosts, ssh)
    • API access control using cloud endpoints
    • Edge protections against DDoS – Cloud CDN, global load balancing, TCP/SSL proxy
    • Network protections against DDoS – cloud network firewall, VM traffic throttling
    • Server side encryption from GCP, Customer managed encryption keys
    • Identity access and auditing (IAM, service accounts, standards compliance, auditing with forseti security open source)
  • Capacity planning and cost optimization
    • Forecast – monitor growth, predict future demand, plan for launches (estimate instance overhead, persistent disk, network capacity, workload)
    • Allocate – estimate capacity needed, load test and validate, calculate resources required by resource : capacity allocation ratio (alternately, recheck if you need more resources by caching, tuning, better algorithms), optimize disk cost, network cost
    • Approve
    • Deploy
  • Deployment, monitoring, alerting and incident response
    • Launch checklist – dependencies (shared infra, external 3rd party), Plan for capacity (verify overload handling), Single point of Failures?, Security and access control, rollout plan (gradual, stages, % of users etc)
    • Launch automation via deployment manager
    • Monitoring à white box – user experience, black box – internal, alerts, ticket’s and logging
    • Push and pull based metrics

Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *