Path: blob/master/azure-hunting/Azure Kubernetes Service Guided Hunting.ipynb
3250 views
Azure Kubernetes Service (AKS) Hunting
Notebook Details
Python Version: 3.8
Platforms Supported: Azure Machine Learning (AML) and Visual Studio Code
Data Sources Leveraged:
Kubernetes control plane logging. To turn this on, head to your AKS cluster in the Azure Portal and find the Diagnostic settings entry in the left sidebar. Choose + Add diagnostic setting and select the log categories you are interested in recording.
kube-apiserverlogs keep track of all of the requests made to your Kubernetes API server. They’re great for spotting attackers peeking around or attempting to make changes.kube-auditlogs provide a time-ordered sequence of all of the actions taken in the cluster and are brilliant for security auditing. It is a superset of the information contained in the kube-apiserver logs and includes operations which are triggered inside the Kubernetes control plane. You can configure your audit policy using the audit.k8s.io/v1/Policy resource type by following the instructions here.
AKS VMSS
auditdlogging. Open Source projects like aks-auditd and Microsoft's OMS Agent for Linux can help you set this up.
Description
This notebook contains hunting hypotheses and queries you can use and expand upon to hunt for adversary activity on your Azure Kubernetes Service (AKS) cluster.
Contents
Prepare your notebook environment
Hunting Hypotheses and Queries
Tips for hunting
Initial Access
Command execution on your cluster's containers
Privilege Escalation
Deployment of privileged containers with the intention to container escape onto AKS worker nodes
Pivoting on high risk host volume path mounts
Execution
Container and worker node kernel activity (Syslog)
Stack counting all program execution on worker node and containers
Investigating potential anomalies in program and command line execution
Determine the distribution of program executions and arguments passed to it
Program execution associated with data exfiltration and program installations
Lateral Movement
Move laterally to cloud resources by requesting access token from IMDS server
Anomalous requests to the Kubernetes API Server
Persistence
Unusual Kubernetes objects being deployed
Container Image posioning
Baseline of container images and image registries
Prepare your environment
This notebook uses kqlmagic and datasets within your Log Analytics workspace to support your AKS hunting. The following section installs kqlmagic and authenticates to your Log Analytics workspace.
Install Pre-requisites
This section of the notebook installs kqlmagic which will be used in this notebook to execute native Kusto queries.
Connect to your Log Analytics Workspace
The following cell connects you with the Log Analytics Workspace (LAW) within your Azure Tenant. You need to copy the code outputted by the cell and provide it in the DeviceLogin site.
⚠️ Please update the LOG_ANALYTICS_WORSPACE_ID to specify the Workspace ID of your Log Analytics Workspace. Your LAW ID can be found as described here.
Hunting Hypotheses and Queries
When hunting for an adversary, the goal is not to enumerate every tactic, technique and procedure they can do. The goal is to find and look at the specific junctions an adversary would need to cross to execute a successful attack. To do this, we have provided you with some hunting queries you can use as signals to indicate something interesting is happening as well as some additional queries and context to help you dive deeper.
⚠️ All these queries serve as starting points for your hunting and investigations. They contain variables that can be expanded on and tweaked to be more applicable to your own environment!
Tips for Hunting
When creating your hunt hypotheses, think about what actions an adversary would need to perform to attack your workloads.
Pay attention to the
User Agentused to make requests to Azure Resource Manager (ARM), and in this case, the Kubernetes API. For example, for production workloads, it might be unusual for you to interact with your cluster usingkubectlor theAzure CLI. If you see this activity, it is a signal that someone is performing hands-on-keyboard activity on your cluster or subscription.
Initial access
Command execution on your cluster's containers
Hunt Hypothesis
Before an attacker can execute commands on your pod, to look for secrets or escape onto the underlying host, they must first
execinto the container
This hypothesis allows us to look for an adversary at a key juncture of their attack. Using kubectl exec to execute commands on your container is advantageous to an attacker:
The commands being run aren’t always logged and as visible as commands specified in the container image.
Enables them to access the service account tokens for that pod. By default, every pod has a service account mounted whose permission is determined by role bindings. In a production cluster, even on a worker node, there is usually at least one pod that has a mounted token that is bound to a service account that is bound to a clusterrolebinding, which gives you access to do things like create pods or view secrets in all namespaces.
Privilege Escalation
Container Escape
By default, a container is isolated from the host system's network and memory address space by using the Linux Kernel's cgroups and namespace features. If a pod is "privileged", it's containers are essentially running without these isolation constructs which gives the container nearly all the same access as processes running on the host.
This gives an attacker a number of advantages:
1. Access to secrets on the underlying worker node:
user account secrets placed by
kubeadminetc/kubernetes. Most other certificated are stored in/etc/kubernetes/pki/etc/kubernetes/azure.jsonon host worker node which contains service principal that has access ( by default Contributor) to all resources in theMC_resource group.Access the
kubeconfigfile on the worker VM which contains the kubelet's service account token. This service account token has permissions to request all the cluster's secrets (depending on your RBAC configuration).Secrets in
tmpfs- those stored in memory on worker node
2. Allows an attacker to run applications directly on the host. This gives an adversary a stealthy backdoor to your cluster.
There are two primary methods for performing a container escape:
Mount the host file system and escalate privileges to get full shell on the node. An attacker can do this by deploying a pod with one or more of the following privileged configurations:
The pod's
securityContextset toprivileged.A privileged
hostPathMountExpose the host process ID namespace by setting
hostPidtoTruein the pod's security context.
Exploit
cgroupsto get interactive root access on the node. A pre-requisite for this attack is toexecinto the container itself, which the above hunting hypothesis should find. Read this blog post for an example of a container escape exploiting Linuxcgroupsv1notify_on_releasefeature.
Hunt Hypothesis
An attacker looking for container escape will deploy a privileged container or modify an existing pod's configuration to give them elevated access to the host's process and network address space
Pivoting on high risk host volume path mounts
This query identifies containers that have been deployed to your cluster that are configured in such a way that exposes the undelying worker node's file system. This is a well-known configuration that enables container escape.
Execution
Container and worker node kernel activity (Syslog)
The following queries require you to enable auditd logging on your AKS worker node VMSS. Open Source projects like aks-auditd and Microsoft's OMS Agent for Linux can help you set this up.
auditd logging provides you with an easy and highly configurable way to gain visibility into your AKS worker node and container kernel level activity. If you are running a multi-tenant cluster, having visibility into your AKS worker node activity is critical and the Kubernetes API server logs aren't always be enough. auditd provides a good solution to this.
In this case, we are going to use this audit logging to view syscall activity, primarily program executions.
Hunting Hypothesis
The activity of an attacker with the ability to execute commands on a AKS container or VMSS worker node will different from the baseline activity of your AKS cluster.
Stack counting all program execution on worker node and containers
This query stack counts all the programs started following a syscall call to execve (59). This gives us a high level overview of the processes running on our cluster.
Some interesting program executions to look out for include the following:
azcopyandtar- Can be used to exfiltrate credentials to attacker owned blob storagecurlandwget- Can be used to install executables on the host
Investigating potential anomalies in program and command line execution
The following query isolates program execution and the command line arguments passed to the program. We can use this to find anomalous programs executed by the host and any deviations from how this program is normally used.
Determine the distribution of program executions and arguments passed to it
In this scenario, we are using basket to find the patterns in what programs are executed and how they are used (command line arguments). In this query, we are ordering the output to show us the most infrequent program executions across your AKS cluster.
You can also pipe the output of the above program execution time series anomaly query to basket in order to enrich it with information on how frequent that pattern (executable name and command line arguments) was found.
This query identifies commands that are known to be used by adversaries once they have compromised a host. Hunt Hypothesis
An attacker will use well known programs on a pod or AKS VMSS worker node to exfiltrate data or expand their foothold on the cluster
Lateral Movement
Move laterally to cloud resources by calling IMDS server
Attacker might move laterally to other resources in your Azure subscription by using curl to retrieve Managed Service Identity (MSI) access tokens from the IMDS service. The following API requests to the IMDS service are worth investigating if you see them on your cluster:
Request for metadata on the VM
Hunt Hypothesis
An attacker looking to move laterally to cloud resources accessible to the cluster will make a request to the IMDS server to retrieve MSIs attached to the underlying AKS worker nodes
Anomalous requests to the K8s API Server
If an attacker has gained access to your cluster and is trying to escalate their privildges and move laterally, it is likely that they will make API requests to the Kubernetes API server that are unusual for that cluster. These requests could include requests to execute commands into the cluster or create new roles and role bindings.
Hunt Hypothesis
The request patterns of an attacker with the ability to communicate with your AKS cluster Kubernetes API server will differ from the baseline requests made by the production workloads running on your cluster.
Persistence
Unusual Kubernetes objects being deployed
Kubernetes is composed of different entities called objects. Attackers may deploy different types of objects that are not normally present on your cluster. For example, an attacker might deploy Kubernetes objects like DaemonSets and Deployments that allow an attacker to affect all pods on your cluster, in contrast to a Pod object which only allows an attacker to control a single pod.
Additionally, objects like Daemonsets and Deployments allow an attacker to bypass some of the configuration change restrictions that prevent someone updating the state of a Pod.
Hunt Hypothesis
An attacker looking for persistence on your cluster will deploy Kubernetes objects like
DamemonSets andDeployments to get a foothold on all pods on your cluster.
Stack Counting Kubernetes Objects that are deployed
This query is a simple stack count of the different types of Kubernetes objects deployed in your cluster. It will likely be clear from the line chart displayed which objects are regularly created as part of your cluster's operation. From here, you can then remove these objects from the graph until you zoom into the most irregularly deployed or "spikey" Kubernetes objects.
Container image posioning
An attcker with access to your container image registry credentials might update an existing container image to give them a backdoor. If you are not using signed Docker images, this is trivially easy for an attacker to do.
One way to identify an image posioning attack is to look for a new image being pushed to Azure Container Regisrty that has two SHA hashes corresponding to a single image version. Depending on how your cluster's pods are configured to pull images, it might be interesting to look for unexpected restarts of pods, following a new image being pushed to your container registry. To identify this activity, you will need to enable specific logging on your ACR as described here. You can then use the ContainerRegistryRepositoryEvents table to find rogue images being pushed.
Hunting Hypothesis
An attacker will update the contents of an existing image without updating the image version, to align with the pod's existing configuration.
Baseline of container images and image registries
Before diving too deep into the behaviour of your cluster, it is useful to run a quick baseline of the container images and container registries being leveraged by your cluster. This helps build your mental model of what containerized workloads are running on your cluster.
While doing this, it's worth noting that certain container images are more valuable to an attacker than others. These include container images like busybox, alpine, ubuntu and specialized images with offensive security images installed. If your production Kubernetes workloads are mostly comprised of custom container images, pulled from your private container registry, unexpected container images and/or container registries easily stand out.
The following query is a simple baseline of the different container images that have been deployed on your cluster, as well as the container registries they have been pulled from. You can use this as a starting point for looking for any container images that may have been pushed to the cluster by an adversary.
Hunt Hypothesis
An adversary will deploy container images that allow them to install tooling and expand their reach on the cluster. Attack deployed containers might use a different container registry or image to those normally used by your containerized workloads.