A multitenant Kubernetes cluster is shared by multiple users and workloads that are commonly referred to as "tenants." This definition includes Kubernetes clusters that are shared by different teams or divisions within an organization. It also includes clusters that are shared by per-customer instances of a software-as-a-service (SaaS) application. Cluster multitenancy is an alternative to managing many single-tenant dedicated clusters. The operators of a multitenant Kubernetes cluster must isolate tenants from each other. This isolation minimizes the damage that a compromised or malicious tenant can do to the cluster and to other tenants.
Although Kubernetes cannot guarantee perfectly secure isolation between tenants, it does offer features that may be sufficient for specific use cases. As a best practice, you should separate each tenant and its Kubernetes resources into their own namespaces. You can then use Kubernetes RBAC and Network Policies to enforce tenant isolation. (RBAC stands for role-based access control.) For example, the following picture shows the typical SaaS Provider Model that hosts multiple instances of the same application on the same cluster, one for each tenant. Each application lives in a separate namespace.
The Application Gateway Ingress Controller (AGIC) is a Kubernetes application, which makes it possible for Azure Kubernetes Service (AKS) customers to use an Azure Application Gateway to expose their containerized applications to the Internet. AGIC monitors the Kubernetes cluster that it is hosted on and continuously updates an Application Gateway so that the selected services are exposed to the Internet. The Ingress Controller runs in its own pod on the customer's AKS instance. AGIC monitors a subset of Kubernetes Resources for changes. The state of the AKS cluster is translated to Application Gateway-specific configuration and applied to the Azure Resource Manager (ARM). This architecture sample shows proven practices to deploy a public or private Azure Kubernetes Service (AKS) cluster, with an Azure Application Gateway and an Application Gateway Ingress Controller add-on.
Azure Container Registry is a managed, private Docker registry service based on the open-source Docker Registry 2.0. You can use Azure container registries with your existing container development and deployment pipelines, or use Azure Container Registry Tasks to build container images in Azure. Build on demand, or fully automate builds with triggers, such as source code commits and base image updates.
Azure Kubernetes Services simplifies deploying a managed Kubernetes cluster in Azure by offloading the operational overhead to Azure. As a hosted Kubernetes service, Azure handles critical tasks, like health monitoring and maintenance. Since Kubernetes masters are managed by Azure, you only manage and maintain the agent nodes.
Azure Key Vault securely stores and controls access to secrets like API keys, passwords, certificates, and cryptographic keys. Azure Key Vault also lets you easily provision, manage, and deploy public and private Transport Layer Security/Secure Sockets Layer (TLS/SSL) certificates, for use with Azure and your internal connected resources.
Azure Application Gateway Azure Application Gateway is a web traffic load balancer that enables you to manage the inbound traffic to multiple downstream web applications and REST APIs. Traditional load balancers operate at the transport layer (OSI layer 4 - TCP and UDP), and they route traffic based on source IP address and port, to a destination IP address and port. The Application Gateway instead is an application layer (OSI layer 7) load balancer. (OSI stands for Open Systems Interconnection, TCP stands for Transmission Control Protocol, and UDP stands for User Datagram Protocol.)
Web Application Firewall or WAF is a service that provides centralized protection of web applications from common exploits and vulnerabilities. WAF is based on rules from the OWASP (Open Web Application Security Project) core rule sets. Azure WAF allows you to create custom rules that are evaluated for each request that passes through a policy. These rules hold a higher priority than the rest of the rules in the managed rule sets. The custom rules contain a rule name, rule priority, and an array of matching conditions. If these conditions are met, an action is taken (to allow or block).
Azure Bastion is a fully managed platform as a service (PaaS) that you provision inside your virtual network. Azure Bastion provides secure and seamless Remote Desktop Protocol (RDP) and secure shell (SSH) connectivity to the VMs in your virtual network, directly from the Azure portal over TLS.
Azure Virtual Machines provides on-demand, scalable computing resources that give you the flexibility of virtualization, without having to buy and maintain the physical hardware.
Azure Virtual Network is the fundamental building block for Azure private networks. With Virtual Network, Azure resources (like VMs) can securely communicate with each other, the internet, and on-premises networks. An Azure Virtual Network is similar to a traditional network that's on premises, but it includes Azure infrastructure benefits, such as scalability, availability, and isolation.
Virtual Network Interfaces let Azure virtual machines communicate with the internet, Azure, and on-premises resources. You can add several network interface cards to one Azure VM, so that child VMs can have their own dedicated network interface devices and IP addresses.
Azure Managed Disks provides block-level storage volumes that Azure manages on Azure VMs. The available types of disks are Ultra disks, Premium solid-state drives (SSDs), Standard SSDs, and Standard hard disk drives (HDDs).
Azure Blob Storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data.
Azure Private Link enables you to access Azure PaaS services (for example, Azure Blob Storage and Key Vault) and Azure-hosted customer-owned/partner services, over a private endpoint in your virtual network.
Although the performance considerations do not fully pertain to multitenancy in Azure Kubernetes Service (AKS), we believe they are essential requirements when deploying this solution:
For low latency workloads, consider deploying a dedicated node pool in a proximity placement group. When deploying your application in Azure, spreading Virtual Machine (VM) instances across regions or availability zones creates network latency, which may impact the overall performance of your application. A proximity placement group is a logical grouping that's used to make sure Azure compute resources are physically located close to each other. Some use cases (such as gaming, engineering simulations, and high-frequency trading (HFT)) require low latency and tasks that complete quickly. For high-performance computing (HPC) scenarios such as these, consider using proximity placement groups (PPG) for your cluster's node pools.
Always use smaller container images, as it helps you to create faster builds. Smaller images are also less vulnerable to attack vectors, because of a reduced attack surface.
Use Kubernetes autoscaling to dynamically scale out the number of worker nodes of an AKS cluster when the traffic increases. With Horizontal Pod Autoscaler and a cluster autoscaler, node and pod volumes get adjusted dynamically in real-time, to match the traffic condition and to avoid downtimes that are caused by capacity issues. For more information
Consider deploying the node pools of your AKS cluster, across all the Availability Zones within a region, and use an Azure Standard Load Balancer or Azure Application Gateway in front of your node pools. This topology provides better resiliency, in case of an outage of a single datacenter. This way, cluster nodes are distributed across multiple datacenters, in three separate Availability Zones within a region.
Enable zone redundancy in Azure Container Registry, for intra-region resiliency and high availability.
Use Pod Topology Spread Constraints to control how pods are spread across your AKS cluster among failure-domains, such as regions, availability zones, and nodes.
Consider using Uptime SLA for AKS clusters that host mission-critical workloads. Uptime SLA is an optional feature to enable a financially backed, higher SLA for a cluster. Uptime SLA guarantees 99.95% availability of the Kubernetes API server endpoint, for clusters that use Availability Zones. And it guarantees 99.9% availability for clusters that don't use Availability Zones. AKS uses master node replicas across update and fault domains, in order to ensure the SLA requirements are met.
Disaster recovery and business continuity
Consider deploying your solution to at least two paired Azure regions within a geography. You should also adopt a global load balancer, such as Azure Traffic Manager or Azure Front Door, with an active/active or active/passive routing method, in order to guarantee business continuity and disaster recovery.
Make sure to script, document, and periodically test any regional failover process in a QA environment, to avoid unpredictable issues if a core service is affected by an outage in the primary region.
These tests are also meant to validate if the DR approach meets the RPO/RTO targets, in conjunction to eventual manual processes and interventions that are needed for a failover.
Make sure you test fail-back procedures, to understand if they work as expected.
Where possible, don't store the service state inside the container. Instead, use an Azure platform as a service (PaaS) that supports multi-region replication.
If you use Azure Storage, prepare and test how to migrate your storage from the primary region to the backup region.
Introduce A/B testing and canary deployments in your application lifecycle management, to properly test an application before making it available for all users. There are several techniques that you can use to split the traffic across different versions of the same service.
As an alternative, you can use the traffic-splitting capabilities that are provided by a service mesh implementation. For more information, see:
Use Azure Container Registry or another container registry (like Docker Hub), to store the private Docker images that are deployed to the cluster. AKS can authenticate with Azure Container Registry, by using its Azure AD identity.
Although the monitoring considerations do not fully pertain to multitenancy in Azure Kubernetes Service (AKS), we believe they are essential requirements when deploying this solution:
The cost of this architecture depends on configuration aspects, like the following:
Scalability, meaning the number of instances that are dynamically allocated by services to support a given demand
Your disaster recovery level
After you assess these aspects, go to the Azure pricing calculator to estimate your costs. Also, for more pricing optimization options, see the Principles of cost optimization in the Microsoft Azure Well-Architected Framework.