GitHub Repository: Azure/Azure-Sentinel-Notebooks
Path: blob/master/azure-hunting/Azure Kubernetes Service Guided Hunting.ipynb
³²⁵⁰ views

Kernel: Python 3.8 - Pytorch and Tensorflow

Azure Kubernetes Service (AKS) Hunting

Notebook Details

Python Version: 3.8

Platforms Supported: Azure Machine Learning (AML) and Visual Studio Code

Data Sources Leveraged:

Kubernetes control plane logging. To turn this on, head to your AKS cluster in the Azure Portal and find the Diagnostic settings entry in the left sidebar. Choose + Add diagnostic setting and select the log categories you are interested in recording.
- kube-apiserver logs keep track of all of the requests made to your Kubernetes API server. They’re great for spotting attackers peeking around or attempting to make changes.
- kube-audit logs provide a time-ordered sequence of all of the actions taken in the cluster and are brilliant for security auditing. It is a superset of the information contained in the kube-apiserver logs and includes operations which are triggered inside the Kubernetes control plane. You can configure your audit policy using the audit.k8s.io/v1/Policy resource type by following the instructions here.
AKS VMSS auditd logging. Open Source projects like aks-auditd and Microsoft's OMS Agent for Linux can help you set this up.

Description

This notebook contains hunting hypotheses and queries you can use and expand upon to hunt for adversary activity on your Azure Kubernetes Service (AKS) cluster.

Prepare your notebook environment
Hunting Hypotheses and Queries
- Tips for hunting
- Initial Access
  - Command execution on your cluster's containers
- Privilege Escalation
  - Deployment of privileged containers with the intention to container escape onto AKS worker nodes
  - Pivoting on high risk host volume path mounts
- Execution
  - Container and worker node kernel activity (Syslog)
    - Stack counting all program execution on worker node and containers
    - Investigating potential anomalies in program and command line execution
    - Determine the distribution of program executions and arguments passed to it
    - Program execution associated with data exfiltration and program installations
- Lateral Movement
  - Move laterally to cloud resources by requesting access token from IMDS server
  - Anomalous requests to the Kubernetes API Server
- Persistence
  - Unusual Kubernetes objects being deployed
  - Container Image posioning
  - Baseline of container images and image registries

Prepare your environment

This notebook uses kqlmagic and datasets within your Log Analytics workspace to support your AKS hunting. The following section installs kqlmagic and authenticates to your Log Analytics workspace.

Install Pre-requisites

This section of the notebook installs kqlmagic which will be used in this notebook to execute native Kusto queries.

In [ ]:

%%capture
import sys
!{sys.executable} -m pip install Kqlmagic --no-cache-dir --upgrade
%reload_ext Kqlmagic

Connect to your Log Analytics Workspace

The following cell connects you with the Log Analytics Workspace (LAW) within your Azure Tenant. You need to copy the code outputted by the cell and provide it in the DeviceLogin site.

⚠️ Please update the LOG_ANALYTICS_WORSPACE_ID to specify the Workspace ID of your Log Analytics Workspace. Your LAW ID can be found as described here.

In [ ]:

LOG_ANALYTICS_WORSPACE_ID = "YOUR LOG ANALYTICS WORKSPACE ID" 
%kql logAnalytics://code;workspace=LOG_ANALYTICS_WORSPACE_ID;

Hunting Hypotheses and Queries

When hunting for an adversary, the goal is not to enumerate every tactic, technique and procedure they can do. The goal is to find and look at the specific junctions an adversary would need to cross to execute a successful attack. To do this, we have provided you with some hunting queries you can use as signals to indicate something interesting is happening as well as some additional queries and context to help you dive deeper.

⚠️ All these queries serve as starting points for your hunting and investigations. They contain variables that can be expanded on and tweaked to be more applicable to your own environment!

Tips for Hunting

When creating your hunt hypotheses, think about what actions an adversary would need to perform to attack your workloads.
Pay attention to the User Agent used to make requests to Azure Resource Manager (ARM), and in this case, the Kubernetes API. For example, for production workloads, it might be unusual for you to interact with your cluster using kubectl or the Azure CLI. If you see this activity, it is a signal that someone is performing hands-on-keyboard activity on your cluster or subscription.

Initial access

Command execution on your cluster's containers

Hunt Hypothesis

Before an attacker can execute commands on your pod, to look for secrets or escape onto the underlying host, they must first exec into the container

This hypothesis allows us to look for an adversary at a key juncture of their attack. Using kubectl exec to execute commands on your container is advantageous to an attacker:

The commands being run aren’t always logged and as visible as commands specified in the container image.
Enables them to access the service account tokens for that pod. By default, every pod has a service account mounted whose permission is determined by role bindings. In a production cluster, even on a worker node, there is usually at least one pod that has a mounted token that is bound to a service account that is bound to a clusterrolebinding, which gives you access to do things like create pods or view secrets in all namespaces.

In [ ]:

# Looks for container command execution into pods. We filter out the command executions by the aksProblemDetector user into the tunnel-front Pod, this is known good an regular activity.
%%kql
AzureDiagnostics
| extend log_s=parse_json(log_s)
| extend verb = tostring(log_s["verb"])
| extend objectRef = log_s["objectRef"]
| extend username = tostring(log_s["user"]["username"])
| extend userAgent = tostring(log_s["userAgent"])
| extend requestURI = tostring(log_s["requestURI"])
| extend resource = tostring(objectRef["resource"])
| where verb == "create" 
| where requestURI contains "/exec"
| where username != "aksProblemDetector" and requestURI !endswith "/exec?command=ls&container=tunnel-front&stderr=true&stdout=true&timeout=20s"
| summarize TimeStamps=make_set(TimeGenerated) by verb, resource, PodName=pod_s, requestURI, username, userAgent, SubscriptionId, tostring(objectRef)

Privilege Escalation

Container Escape

By default, a container is isolated from the host system's network and memory address space by using the Linux Kernel's cgroups and namespace features. If a pod is "privileged", it's containers are essentially running without these isolation constructs which gives the container nearly all the same access as processes running on the host.

This gives an attacker a number of advantages:

1. Access to secrets on the underlying worker node:

user account secrets placed by kubeadm in etc/kubernetes. Most other certificated are stored in /etc/kubernetes/pki
/etc/kubernetes/azure.json on host worker node which contains service principal that has access ( by default Contributor) to all resources in the MC_ resource group.
Access the kubeconfig file on the worker VM which contains the kubelet's service account token. This service account token has permissions to request all the cluster's secrets (depending on your RBAC configuration).
Secrets in tmpfs - those stored in memory on worker node

2. Allows an attacker to run applications directly on the host. This gives an adversary a stealthy backdoor to your cluster.

There are two primary methods for performing a container escape:

Mount the host file system and escalate privileges to get full shell on the node. An attacker can do this by deploying a pod with one or more of the following privileged configurations:

The pod's securityContext set to privileged.
A privileged hostPath Mount
An exposed docker socket
Expose the host process ID namespace by setting hostPid to True in the pod's security context.

Exploit cgroups to get interactive root access on the node. A pre-requisite for this attack is to exec into the container itself, which the above hunting hypothesis should find. Read this blog post for an example of a container escape exploiting Linux cgroups v1 notify_on_release feature.

Hunt Hypothesis

An attacker looking for container escape will deploy a privileged container or modify an existing pod's configuration to give them elevated access to the host's process and network address space

In [ ]:

%%kql
// Hunting query to find deployment of priviledged pods 
let lookbackStart = ago(50d);
let lookbackEnd = now();
let timeStep = 1d;
AzureDiagnostics
// Filter to time range you want to examine
| where TimeGenerated between(lookbackStart..lookbackEnd)
| extend log_s=parse_json(log_s)
| extend verb = tostring(log_s["verb"])
| extend objectRef = log_s["objectRef"]
| extend requestURI = tostring(log_s["requestURI"])
| extend resource = tostring(objectRef["resource"])
| where verb == "create"
| where requestURI !contains "/exec"
| where resource == "pods"
| extend requestObject = log_s["requestObject"]
| extend spec = requestObject["spec"]
| extend containers = spec["containers"][0]
| extend username = tostring(log_s["user"]["username"])
| extend userAgent = tostring(log_s["userAgent"])
| project
    TimeGenerated,
    containerName=tostring(containers["name"]),
    containerImage=tostring(containers["image"]), 
    securityContext=tostring(containers["securityContext"]), 
    volumeMounts=tostring(containers["volumeMounts"]), 
    namespace=tostring(objectRef["namespace"]),
    username,
    userAgent, 
    containers, 
    requestObject, 
    objectRef, 
    spec
| where isnotempty(securityContext) 
// Filtering for cases where the coontainer has a priviledged security context or host process namespace is exposed
| where parse_json(todynamic(securityContext)["privileged"]) == "true" or parse_json(todynamic(spec)["hostPID"]) == "true" 
| summarize Count=count() by bin(TimeGenerated, timeStep), containerImage, namespace, containerName
| render timechart

Pivoting on high risk host volume path mounts

This query identifies containers that have been deployed to your cluster that are configured in such a way that exposes the undelying worker node's file system. This is a well-known configuration that enables container escape.

In [ ]:

%%kql
let highRiskHostVolumePaths = datatable (path: string) [
    "/",
    "/var/log",
    "/var/run/docker.sock"
];
let _startLookBack = ago(1d);
let _endLookBack = now();
AzureDiagnostics
| where TimeGenerated between (_startLookBack.._endLookBack)
| extend log_s=parse_json(log_s)
| extend verb = tostring(log_s["verb"])
| extend objectRef = log_s["objectRef"]
| extend requestURI = tostring(log_s["requestURI"])
| extend resource = tostring(objectRef["resource"])
| where verb == "create"
| where requestURI !contains "/exec"
| where resource == "pods"
| extend spec = log_s["requestObject"]["spec"]
| extend containers = spec["containers"][0]
| extend hostVolumeMounts=spec["volumes"]
| where isnotempty(hostVolumeMounts) 
| mv-expand hostVolumeMount=hostVolumeMounts
| extend hostVolumeName = hostVolumeMount["name"], hostPath=hostVolumeMount["hostPath"]
| extend hostPathName=hostPath["path"], hostPathType=hostPath["type"]
| where isnotempty(hostPathName)
| where hostPathName has_any(highRiskHostVolumePaths) 
| project 
    TimeGenerated, 
    podName=tostring(objectRef["name"]), 
    containerName=tostring(containers["name"]), 
    containerImage=tostring(containers["image"]), 
    namespace=tostring(objectRef["namespace"]),  
    hostVolumeName, 
    hostPathName, 
    hostPathType, 
    securityContext=tostring(containers["securityContext"]), 
    volumeMounts=containers["volumeMounts"]

Execution

Container and worker node kernel activity (Syslog)

The following queries require you to enable auditd logging on your AKS worker node VMSS. Open Source projects like aks-auditd and Microsoft's OMS Agent for Linux can help you set this up.

auditd logging provides you with an easy and highly configurable way to gain visibility into your AKS worker node and container kernel level activity. If you are running a multi-tenant cluster, having visibility into your AKS worker node activity is critical and the Kubernetes API server logs aren't always be enough. auditd provides a good solution to this.

In this case, we are going to use this audit logging to view syscall activity, primarily program executions.

Hunting Hypothesis

The activity of an attacker with the ability to execute commands on a AKS container or VMSS worker node will different from the baseline activity of your AKS cluster.

Stack counting all program execution on worker node and containers

This query stack counts all the programs started following a syscall call to execve (59). This gives us a high level overview of the processes running on our cluster.

Some interesting program executions to look out for include the following:

azcopy and tar - Can be used to exfiltrate credentials to attacker owned blob storage
curl and wget - Can be used to install executables on the host

In [ ]:

%%kql
let _startTime = ago(30d);
let _endTime = now();
Syslog
| where TimeGenerated between(_startTime.._endTime)
| where Facility == "authpriv"
| where ProcessName == "audispd"
| parse SyslogMessage with * "type=" type " msg=audit(" EventID "): " info
| extend KeyValuePairs = array_concat(
    extract_all(@"([\w\d]+)=([^ ]+)", info),
    extract_all(@"([\w\d]+)=""([^""]+)""", info))
| mv-apply KeyValuePairs on 
(
    extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
    | summarize Info=make_bag(p)
)
| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID
| where EventInfo["SYSCALL"]["syscall"] == "59"
| summarize Total=count(), FirstSeen=min(TimeGenerated), LastSeen=max(TimeGenerated) by tostring(EventInfo["SYSCALL"]["exe"])
| order by Total asc

Investigating potential anomalies in program and command line execution

The following query isolates program execution and the command line arguments passed to the program. We can use this to find anomalous programs executed by the host and any deviations from how this program is normally used.

In [ ]:

%%kql
let _startTime = ago(30d);
let _endTime = now();
let TimeSeriesAnomalies = Syslog
| where TimeGenerated between(_startTime.._endTime)
| where Facility == "authpriv"
| where ProcessName == "audispd"
| parse SyslogMessage with * "type=" type " msg=audit(" EventID "): " info
| extend KeyValuePairs = array_concat(
    extract_all(@"([\w\d]+)=([^ ]+)", info),
    extract_all(@"([\w\d]+)=""([^""]+)""", info))
| mv-apply KeyValuePairs on 
(
    extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
    | summarize Info=make_bag(p)
)
| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID
| where EventInfo["SYSCALL"]["syscall"] == "59"
| extend Exe = tostring(EventInfo["SYSCALL"]["exe"])
| extend CommandLine = EventInfo["EXECVE"]
| mv-apply CommandLine on ( 
     extend key = tostring(bag_keys(CommandLine)[0])
    | where key matches regex  @"^a\d+$"
    | parse CommandLine[key] with '"' Argument '"'
    | project Argument = iff(indexof(Argument, " ") >= 0, CommandLine[key], Argument)
    | summarize CommandLine = make_list(Argument, 50)
    | extend CommandLine = strcat_array(CommandLine, " ")
)
|make-series Total=count() on TimeGenerated from _startTime to _endTime step 1h by  Exe, CommandLine
| extend (anomalies, score, baseline) = series_decompose_anomalies(Total, 1.5, -1, 'linefit')
| mv-expand Total to typeof(double), TimeGenerated to typeof(datetime), anomalies to typeof(double),score to typeof(double), baseline to typeof(long);
TimeSeriesAnomalies

Determine the distribution of program executions and arguments passed to it

In this scenario, we are using basket to find the patterns in what programs are executed and how they are used (command line arguments). In this query, we are ordering the output to show us the most infrequent program executions across your AKS cluster.

You can also pipe the output of the above program execution time series anomaly query to basket in order to enrich it with information on how frequent that pattern (executable name and command line arguments) was found.

In [ ]:

%%kql
let _startTime = ago(30d);
let _endTime = now();
Syslog
| where TimeGenerated between(_startTime.._endTime)
| where Facility == "user"
| where ProcessName == "audispd"
| parse SyslogMessage with * "type=" type " msg=audit(" EventID "): " info
| extend KeyValuePairs = array_concat(
    extract_all(@"([\w\d]+)=([^ ]+)", info),
    extract_all(@"([\w\d]+)=""([^""]+)""", info))
| mv-apply KeyValuePairs on 
(
    extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
    | summarize Info=make_bag(p)
)
| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID
| where EventInfo["SYSCALL"]["syscall"] == "59"
| extend Exe = tostring(EventInfo["SYSCALL"]["exe"])
| extend CommandLine = EventInfo["EXECVE"]
| mv-apply CommandLine on ( 
     extend key = tostring(bag_keys(CommandLine)[0])
    | where key matches regex  @"^a\d+$"
    | parse CommandLine[key] with '"' Argument '"'
    | project Argument = iff(indexof(Argument, " ") >= 0, CommandLine[key], Argument)
    | summarize CommandLine = make_list(Argument, 50)
    | extend CommandLine = strcat_array(CommandLine, " ")
)
| project HostName, Exe, CommandLine
| evaluate basket(0.001)
| order by Percent asc

This query identifies commands that are known to be used by adversaries once they have compromised a host. Hunt Hypothesis

An attacker will use well known programs on a pod or AKS VMSS worker node to exfiltrate data or expand their foothold on the cluster

In [ ]:

%%kql
let riskyCommands = datatable(command: string)[
    "azcopy",
    "wget",
    "tar",
    "curl"
];
Syslog
| where Facility == "authpriv"
| where ProcessName == "audispd"
| parse SyslogMessage with * "type=" type " msg=audit(" EventID "): " info
| extend KeyValuePairs = array_concat(
    extract_all(@"([\w\d]+)=([^ ]+)", info),
    extract_all(@"([\w\d]+)=""([^""]+)""", info))
| mv-apply KeyValuePairs on 
(
    extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
    | summarize Info=make_bag(p)
)
| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID
| where EventInfo["PATH"]["name"] has "curl"
| mv-expand bagexpansion=array Arg=EventInfo["EXECVE"]
| summarize arg_min(TimeGenerated, HostName, EventInfo), Args=strcat_array(make_list_if(Arg[1], Arg[0] != "argc", 20), " ") by EventID
| where Args has_any(riskyCommands)
| project EventID, TimeGenerated, HostName, Args, CurrentWorkingDirectory=tostring(EventInfo["CWD"]["cwd"]), Path=tostring(EventInfo["PATH"]["name"])
| summarize TotalRequests=count(), FirstSeen=min(TimeGenerated), LastSeen=max(TimeGenerated) by HostName, Args

Lateral Movement

Move laterally to cloud resources by calling IMDS server

Attacker might move laterally to other resources in your Azure subscription by using curl to retrieve Managed Service Identity (MSI) access tokens from the IMDS service. The following API requests to the IMDS service are worth investigating if you see them on your cluster:

Request for metadata on the VM
Request for access tokens

Hunt Hypothesis

An attacker looking to move laterally to cloud resources accessible to the cluster will make a request to the IMDS server to retrieve MSIs attached to the underlying AKS worker nodes

In [ ]:

%%kql
Syslog
| where Facility == "authpriv"
| where ProcessName == "audispd"
| parse SyslogMessage with * "type=" type " msg=audit(" EventID "): " info
| extend KeyValuePairs = array_concat(
    extract_all(@"([\w\d]+)=([^ ]+)", info),
    extract_all(@"([\w\d]+)=""([^""]+)""", info))
| mv-apply KeyValuePairs on 
(
    extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
    | summarize Info=make_bag(p)
)
| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID
| where EventInfo["PATH"]["name"] has "curl"
| mv-expand bagexpansion=array Arg=EventInfo["EXECVE"]
| where Arg[0] != "argc"
| where Arg[1] contains "169.254.169.254"
| extend IMDSURL = tostring(Arg[1])
| summarize TotalRequests=count(), FirstSeen=min(TimeGenerated), LastSeen=max(TimeGenerated), Hosts=makeset(HostName) by IMDSURL

In [ ]:

%%kql
Syslog
| where Facility == "authpriv"
| where ProcessName == "audispd"
| parse SyslogMessage with * "type=" type " msg=audit(" EventID "): " info
| extend KeyValuePairs = array_concat(
    extract_all(@"([\w\d]+)=([^ ]+)", info),
    extract_all(@"([\w\d]+)=""([^""]+)""", info))
| mv-apply KeyValuePairs on 
(
    extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
    | summarize Info=make_bag(p)
)
| summarize arg_min(TimeGenerated, HostName), EventInfo=make_bag(pack(type, Info)) by EventID
| where EventInfo["PATH"]["name"] has "curl"
| mv-expand bagexpansion=array Arg=EventInfo["EXECVE"]
| summarize arg_min(TimeGenerated, HostName, EventInfo), Args=strcat_array(make_list_if(Arg[1], Arg[0] != "argc", 20), " ") by EventID
| where Args contains "http://169.254.169.254/metadata/identity/oauth2/token"
| project EventID, TimeGenerated, HostName, Args, CurrentWorkingDirectory=tostring(EventInfo["CWD"]["cwd"]), Path=tostring(EventInfo["PATH"]["name"])

Anomalous requests to the K8s API Server

If an attacker has gained access to your cluster and is trying to escalate their privildges and move laterally, it is likely that they will make API requests to the Kubernetes API server that are unusual for that cluster. These requests could include requests to execute commands into the cluster or create new roles and role bindings.

Hunt Hypothesis

The request patterns of an attacker with the ability to communicate with your AKS cluster Kubernetes API server will differ from the baseline requests made by the production workloads running on your cluster.

In [ ]:

%%kql
let _startTime = ago(30d);
let _endTime = ago(1d);
let _timestep = 1h;
let _totalEventThreshold = 5;
let DiagnosticEvents = AzureDiagnostics
| where TimeGenerated between (_startTime.._endTime)
| extend TimeBucket = bin(TimeGenerated, _timestep)
| extend log_s = parse_json(log_s)
| extend verb = tostring(log_s["verb"])
| extend objectRef = log_s["objectRef"]
| extend resourceName = tostring(objectRef["name"])
| extend requestURI = tostring(log_s["requestURI"])
| extend resource = tostring(objectRef["resource"])
| extend username = tostring(log_s["user"]["username"])
| extend userAgent = tostring(log_s["userAgent"])
| where isnotempty(resourceName)
| where isnotempty(username);
let K8sAPIRequestAnomalies = DiagnosticEvents
| make-series Total = count() on TimeBucket from bin(ago(_startTime), _timestep) to bin(ago(_endTime), _timestep)+_timestep step _timestep by verb, resource, username, userAgent
// More documentation on the series_decompose_anomalies Kusto function can be found here https://docs.microsoft.com/azure/data-explorer/kusto/query/series-decompose-anomaliesfunction
| extend (anomalyFlag, anomalyScore, expectedValue) = series_decompose_anomalies(Total, 5, -1, 'linefit', 0, "ctukey")
| mv-expand Total to typeof(double), TimeBucket to typeof(datetime), anomalyFlag to typeof(double), anomalyScore to typeof(double), expectedValue to typeof(double)
| where anomalyFlag > 0 
| where Total > _totalEventThreshold
| order by anomalyScore desc;
K8sAPIRequestAnomalies
| lookup kind=inner DiagnosticEvents on TimeBucket, verb, resource, username, userAgent
//| project TimeGenerated, verb, resource, resourceName, username, requestURI, userAgent, Total, expectedValue, anomalyFlag, anomalyScore
| summarize Events=count(), RequestURIs=make_set(requestURI), resourceNames=make_set(resourceName), expectedValue=any(expectedValue), anomalyScore=any(anomalyScore) by TimeBucket, verb, resource, username, userAgent

Persistence

Unusual Kubernetes objects being deployed

Kubernetes is composed of different entities called objects. Attackers may deploy different types of objects that are not normally present on your cluster. For example, an attacker might deploy Kubernetes objects like DaemonSets and Deployments that allow an attacker to affect all pods on your cluster, in contrast to a Pod object which only allows an attacker to control a single pod.

Additionally, objects like Daemonsets and Deployments allow an attacker to bypass some of the configuration change restrictions that prevent someone updating the state of a Pod. Hunt Hypothesis

An attacker looking for persistence on your cluster will deploy Kubernetes objects like DamemonSets and Deployments to get a foothold on all pods on your cluster.

Stack Counting Kubernetes Objects that are deployed

This query is a simple stack count of the different types of Kubernetes objects deployed in your cluster. It will likely be clear from the line chart displayed which objects are regularly created as part of your cluster's operation. From here, you can then remove these objects from the graph until you zoom into the most irregularly deployed or "spikey" Kubernetes objects.

In [ ]:

%%kql
// Ignore resource creations that are unlikely to be used by an adversary for persistence
let ignoredResources = datatable(type:string)[
"tokenreviews",
"events",
"subjectaccessreviews",
"selfsubjectaccessreviews",
"storageclasses"
];
let _startLookBack = ago(50d);
let _endLookBack = now();
let _stepTime = 1h;
AzureDiagnostics
| where TimeGenerated between(_startLookBack.._endLookBack)
| extend log_s=parse_json(log_s)
| extend verb = tostring(log_s["verb"])
| extend objectRef = log_s["objectRef"]
| extend requestURI = tostring(log_s["requestURI"])
| extend resource = tostring(objectRef["resource"])
| extend name=tostring(objectRef["name"])
| where verb == "create"
| where resource !in(ignoredResources)
| summarize Count=count() by bin(TimeGenerated, _stepTime),resource
| render timechart

Container image posioning

An attcker with access to your container image registry credentials might update an existing container image to give them a backdoor. If you are not using signed Docker images, this is trivially easy for an attacker to do.

One way to identify an image posioning attack is to look for a new image being pushed to Azure Container Regisrty that has two SHA hashes corresponding to a single image version. Depending on how your cluster's pods are configured to pull images, it might be interesting to look for unexpected restarts of pods, following a new image being pushed to your container registry. To identify this activity, you will need to enable specific logging on your ACR as described here. You can then use the ContainerRegistryRepositoryEvents table to find rogue images being pushed.

Hunting Hypothesis

An attacker will update the contents of an existing image without updating the image version, to align with the pod's existing configuration.

Baseline of container images and image registries

Before diving too deep into the behaviour of your cluster, it is useful to run a quick baseline of the container images and container registries being leveraged by your cluster. This helps build your mental model of what containerized workloads are running on your cluster.

While doing this, it's worth noting that certain container images are more valuable to an attacker than others. These include container images like busybox, alpine, ubuntu and specialized images with offensive security images installed. If your production Kubernetes workloads are mostly comprised of custom container images, pulled from your private container registry, unexpected container images and/or container registries easily stand out.

The following query is a simple baseline of the different container images that have been deployed on your cluster, as well as the container registries they have been pulled from. You can use this as a starting point for looking for any container images that may have been pushed to the cluster by an adversary.

Hunt Hypothesis

An adversary will deploy container images that allow them to install tooling and expand their reach on the cluster. Attack deployed containers might use a different container registry or image to those normally used by your containerized workloads.

In [ ]:

%%kql 
// You can include your private container registry server here if you want to exclude image pulled from there
let _trustedContainerRegistries = datatable ( registry: string)[
"mcr.microsoft.com",
];
let _startLookBack = ago(7d);
let _endLookBack = now();
let _timeStep = 1d;
AzureDiagnostics
| extend log_s=parse_json(log_s)
| extend verb = tostring(log_s["verb"])
| extend objectRef = log_s["objectRef"]
| extend requestURI = tostring(log_s["requestURI"])
| extend resource = tostring(objectRef["resource"])
| where verb == "create"
| where requestURI !contains "/exec"
| where resource == "pods"
| extend requestObject = log_s["requestObject"]
| extend spec = requestObject["spec"]
| extend containers = spec["containers"][0]
// Additional fields are included here if you want more context in a table output, rather than a timechart
| project
    TimeGenerated,
    containerName=tostring(containers["name"]),
    containerImage=tostring(containers["image"]), 
    securityContext=tostring(containers["securityContext"]), 
    volumeMounts=tostring(containers["volumeMounts"]), 
    namespace=tostring(objectRef["namespace"]), 
    containers, 
    objectRef 
| where isnotempty(containerImage)
| where not(containerImage has_any(_trustedContainerRegistries))
| make-series Count=count() on TimeGenerated from _startLookBack to _endLookBack step _timeStep by containerImage
| render timechart

Azure Kubernetes Service (AKS) Hunting

Notebook Details

Description

Contents

Prepare your environment

Install Pre-requisites

Connect to your Log Analytics Workspace

Hunting Hypotheses and Queries

Tips for Hunting

Initial access

Command execution on your cluster's containers

Privilege Escalation

Container Escape

Pivoting on high risk host volume path mounts

Execution

Container and worker node kernel activity (Syslog)

Stack counting all program execution on worker node and containers

Investigating potential anomalies in program and command line execution

Determine the distribution of program executions and arguments passed to it

Lateral Movement

Move laterally to cloud resources by calling IMDS server

Anomalous requests to the K8s API Server

Persistence

Unusual Kubernetes objects being deployed

Stack Counting Kubernetes Objects that are deployed

Container image posioning

Baseline of container images and image registries

Product

Resources

Company