Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Azure
GitHub Repository: Azure/Azure-Sentinel-Notebooks
Path: blob/master/tutorials-and-examples/example-notebooks/Example - Guided Investigation - Process-Alerts.ipynb
3253 views
Kernel: Python 3

Title: Alert Investigation (Windows Process Alerts)

Notebook Version: 1.0
Python Version: Python 3.10 (including Python 3.10 - SDK v2 - AzureML)
Required Packages: kqlmagic, msticpy, pandas, numpy, matplotlib, networkx, ipywidgets, ipython, scikit_learn
Platforms Supported:

  • Azure Notebooks Free Compute

  • Azure Notebooks DSVM

  • OS Independent

Data Sources Required:

  • Log Analytics - SecurityAlert, SecurityEvent (EventIDs 4688 and 4624/25)

  • (Optional) - VirusTotal (with API key)

Description:

This notebook is intended for triage and investigation of security alerts. It is specifically targeted at alerts triggered by suspicious process activity on Windows hosts. Some of the sections will work on other types of alerts but this is not guaranteed.

Warning: Example Notebook - No longer supported!

 This notebooks is meant to be illustrative of specific scenarios and is not actively maintained. 
 It is unlikely to be runnable directly in your environment. Instead, please use the notebooks in the root of this repo. 

Contents

Setup

  1. Make sure that you have installed packages specified in the setup (uncomment the lines to execute)

  2. There are some manual steps up to selecting the alert ID. After this most of the notebook can be executed sequentially

  3. Major sections should be executable independently (e.g. Alert Command line and Host Logons can be run skipping Session Process Tree)

Install Packages

The first time this cell runs for a new Azure Notebooks project or local Python environment it will take several minutes to download and install the packages. In subsequent runs it should run quickly and confirm that package dependencies are already installed. Unless you want to upgrade the packages you can feel free to skip execution of the next cell.

If you see any import failures (ImportError) in the notebook, please re-run this cell and answer 'y', then re-run the cell where the failure occurred.

Note you may see some warnings about package incompatibility with certain packages. This does not affect the functionality of this notebook but you may need to upgrade the packages producing the warnings to a more recent version.

import sys import warnings warnings.filterwarnings("ignore",category=DeprecationWarning) MIN_REQ_PYTHON = "3.10" if sys.version_info < MIN_REQ_PYTHON: print(f'Check the Kernel->Change Kernel menu and ensure that {MIN_REQ_PYTHON}') print('or later is selected as the active kernel.') sys.exit("Python %s.%s or later is required.\n" % MIN_REQ_PYTHON) # Package Installs - try to avoid if they are already installed try: import msticpy.sectools as sectools import Kqlmagic print('If you answer "n" this cell will exit with an error in order to avoid the pip install calls,') print('This error can safely be ignored.') resp = input('msticpy and Kqlmagic packages are already loaded. Do you want to re-install? (y/n)') if resp.strip().lower() != 'y': sys.exit('pip install aborted - you may skip this error and continue.') else: print('After installation has completed, restart the current kernel and run ' 'the notebook again skipping this cell.') except ImportError: pass print('\nPlease wait. Installing required packages. This may take a few minutes...') !pip install git+https://github.com/microsoft/msticpy --upgrade --user !pip install Kqlmagic --no-cache-dir --upgrade --user print('\nTo ensure that the latest versions of the installed libraries ' 'are used, please restart the current kernel and run ' 'the notebook again skipping this cell.')

Import Python Packages

Get WorkspaceId

To find your Workspace Id go to Log Analytics. Look at the workspace properties to find the ID.

# Imports import sys import numpy as np from IPython import get_ipython from IPython.display import display, HTML, Markdown import ipywidgets as widgets import matplotlib.pyplot as plt import seaborn as sns import networkx as nx sns.set() import pandas as pd pd.set_option('display.max_rows', 500) pd.set_option('display.max_columns', 50) pd.set_option('display.max_colwidth', 100) import msticpy.sectools as sectools import msticpy.nbtools as mas import msticpy.nbtools.kql as qry import msticpy.nbtools.nbdisplay as nbdisp # Some of our dependencies (networkx) still use deprecated Matplotlib # APIs - we can't do anything about it so suppress them from view from matplotlib import MatplotlibDeprecationWarning warnings.simplefilter("ignore", category=MatplotlibDeprecationWarning)
import os from msticpy.nbtools.wsconfig import WorkspaceConfig ws_config_file = 'config.json' WORKSPACE_ID = None TENANT_ID = None try: ws_config = WorkspaceConfig(ws_config_file) display(Markdown(f'Read Workspace configuration from local config.json for workspace **{ws_config["workspace_name"]}**')) for cf_item in ['tenant_id', 'subscription_id', 'resource_group', 'workspace_id', 'workspace_name']: display(Markdown(f'**{cf_item.upper()}**: {ws_config[cf_item]}')) if ('cookiecutter' not in ws_config['workspace_id'] or 'cookiecutter' not in ws_config['tenant_id']): WORKSPACE_ID = ws_config['workspace_id'] TENANT_ID = ws_config['tenant_id'] except: pass if not WORKSPACE_ID or not TENANT_ID: display(Markdown('**Workspace configuration not found.**\n\n' 'Please go to your Log Analytics workspace, copy the workspace ID' ' and/or tenant Id and paste here.<br> ' 'Or read the workspace_id from the config.json in your Azure Notebooks project.')) ws_config = None ws_id = mas.GetEnvironmentKey(env_var='WORKSPACE_ID', prompt='Please enter your Log Analytics Workspace Id:', auto_display=True) ten_id = mas.GetEnvironmentKey(env_var='TENANT_ID', prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)

Read Workspace configuration from local config.json for workspace ASIHuntOMSWorkspaceV4

TENANT_ID: 72f988bf-86f1-41af-91ab-2d7cd011db47

SUBSCRIPTION_ID: 40dcc8bf-0478-4f3b-b275-ed0a94f2c013

RESOURCE_GROUP: ASIHuntOMSWorkspaceRG

WORKSPACE_ID: 52b1ab41-869e-4138-9e40-2a4457f09bf0

WORKSPACE_NAME: ASIHuntOMSWorkspaceV4

Authenticate to Log Analytics

If you are using user/device authentication, run the following cell.

  • Click the 'Copy code to clipboard and authenticate' button.

  • This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard.

  • Select the text box and paste (Ctrl-V/Cmd-V) the copied value.

  • You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.

Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:

%kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)

instead of

%kql loganalytics://code().workspace(WORKSPACE_ID)

Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
On successful authentication you should see a popup schema button.

if not WORKSPACE_ID or not TENANT_ID: try: WORKSPACE_ID = ws_id.value TENANT_ID = ten_id.value except NameError: raise ValueError('No workspace or Tenant Id.') mas.kql.load_kql_magic() %kql loganalytics://code().tenant(TENANT_ID).workspace(WORKSPACE_ID)

Contents

Get Alerts List

Specify a time range to search for alerts. One this is set run the following cell to retrieve any alerts in that time window. You can change the time range and re-run the queries until you find the alerts that you want.

alert_q_times = mas.QueryTime(units='day', max_before=20, before=5, max_after=1) alert_q_times.display()
HTML(value='<h4>Set query time boundaries</h4>')
HBox(children=(DatePicker(value=datetime.date(2019, 2, 26), description='Origin Date'), Text(value='20:06:14.9…
VBox(children=(IntRangeSlider(value=(-5, 1), description='Time Range (day):', layout=Layout(width='80%'), max=…
alert_counts = qry.list_alerts_counts(provs=[alert_q_times]) alert_list = qry.list_alerts(provs=[alert_q_times]) print(len(alert_counts), ' distinct alert types') print(len(alert_list), ' distinct alerts') display(HTML('<h2>Alert Timeline</h2>')) nbdisp.display_timeline(data=alert_list, source_columns = ['AlertName', 'CompromisedEntity'], title='Alerts', height=200) display(HTML('<h2>Top alerts</h2>')) alert_counts.head(20) # remove '.head(20)'' to see the full list grouped by AlertName
12 distinct alert types 51 distinct alerts
MIME type unknown not supported
MIME type unknown not supported

Contents

Choose Alert to Investigate

Either pick an alert from a list of retrieved alerts or paste the SystemAlertId into the text box in the following section.

Select alert from list

As you select an alert, the main properties will be shown below the list.

Use the filter box to narrow down your search to any substring in the AlertName.

alert_select = mas.SelectAlert(alerts=alert_list, action=nbdisp.display_alert) alert_select.display()
VBox(children=(Text(value='', description='Filter alerts by title:', style=DescriptionStyle(description_width=…

Or paste in an alert ID and fetch it

Skip this if you selected from the above list

# Allow alert to be selected # Allow subscription to be selected get_alert = mas.GetSingleAlert(action=nbdisp.display_alert) get_alert.display()
VBox(children=(Text(value='', description='SystemAlertId for alert :', layout=Layout(width='50%'), placeholder…

Contents

Extract properties and entities from Alert

This section extracts the alert information and entities into a SecurityAlert object allowing us to query the properties more reliably.

In particular, we use the alert to automatically provide parameters for queries and UI elements. Subsequent queries will use properties like the host name and derived properties such as the OS family (Linux or Windows) to adapt the query. Query time selectors like the one above will also default to an origin time that matches the alert selected.

The alert view below shows all of the main properties of the alert plus the extended property dictionary (if any) and JSON representations of the Entity.

# Extract entities and properties into a SecurityAlert class if alert_select.selected_alert is None and get_alert.selected_alert is None: sys.exit("Please select an alert before executing remaining cells.") if get_alert.selected_alert is not None: security_alert = mas.SecurityAlert(get_alert.selected_alert) elif alert_select.selected_alert is not None: security_alert = mas.SecurityAlert(alert_select.selected_alert) mas.disp.display_alert(security_alert, show_entities=True)
{ 'AzureID': '/subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourceGroups/ASIHuntOMSWorkspaceRG/providers/Microsoft.Compute/virtualMachines/MSTICAlertsWin1', 'HostName': 'msticalertswin1', 'OMSAgentID': '263a788b-6526-4cdc-8ed9-d79402fe4aa0', 'Type': 'host'} { 'Host': { 'AzureID': '/subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourceGroups/ASIHuntOMSWorkspaceRG/providers/Microsoft.Compute/virtualMachines/MSTICAlertsWin1', 'HostName': 'msticalertswin1', 'OMSAgentID': '263a788b-6526-4cdc-8ed9-d79402fe4aa0', 'Type': 'host'}, 'NTDomain': 'msticalertswin1', 'Name': 'msticadmin', 'Type': 'account'} { 'Account': { 'Host': { 'AzureID': '/subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourceGroups/ASIHuntOMSWorkspaceRG/providers/Microsoft.Compute/virtualMachines/MSTICAlertsWin1', 'HostName': 'msticalertswin1', 'OMSAgentID': '263a788b-6526-4cdc-8ed9-d79402fe4aa0', 'Type': 'host'}, 'NTDomain': 'msticalertswin1', 'Name': 'msticadmin', 'Type': 'account'}, 'EndTimeUtc': '2019-02-13T22:03:42.8164656Z', 'Host': { 'AzureID': '/subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourceGroups/ASIHuntOMSWorkspaceRG/providers/Microsoft.Compute/virtualMachines/MSTICAlertsWin1', 'HostName': 'msticalertswin1', 'OMSAgentID': '263a788b-6526-4cdc-8ed9-d79402fe4aa0', 'Type': 'host'}, 'SessionId': '0x1e821b5', 'StartTimeUtc': '2019-02-13T22:03:42.8164656Z', 'Type': 'hostlogonsession'} { 'Directory': 'c:\\w!ndows\\system32', 'FullPath': 'c:\\w!ndows\\system32\\suchost.exe', 'Name': 'suchost.exe', 'Type': 'file'} { 'Account': { 'Host': { 'AzureID': '/subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourceGroups/ASIHuntOMSWorkspaceRG/providers/Microsoft.Compute/virtualMachines/MSTICAlertsWin1', 'HostName': 'msticalertswin1', 'OMSAgentID': '263a788b-6526-4cdc-8ed9-d79402fe4aa0', 'Type': 'host'}, 'NTDomain': 'msticalertswin1', 'Name': 'msticadmin', 'Type': 'account'}, 'CommandLine': '.\\suchost.exe -a cryptonight -o bcn -u bond007.01 -p x -t 4', 'Host': { 'AzureID': '/subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourceGroups/ASIHuntOMSWorkspaceRG/providers/Microsoft.Compute/virtualMachines/MSTICAlertsWin1', 'HostName': 'msticalertswin1', 'OMSAgentID': '263a788b-6526-4cdc-8ed9-d79402fe4aa0', 'Type': 'host'}, 'ImageFile': { 'Directory': 'c:\\w!ndows\\system32', 'FullPath': 'c:\\w!ndows\\system32\\suchost.exe', 'Name': 'suchost.exe', 'Type': 'file'}, 'Type': 'process'}

Contents

Entity Graph

Depending on the type of alert there may be one or more entities attached as properties. Entities are things like Host, Account, IpAddress, Process, etc. - essentially the 'nouns' of security investigation. Events and alerts are the things that link them in actions so can be thought of as the verbs. Entities are often related to other entities - for example a process will usually have a related file entity (the process image) and an Account entity (the context in which the process was running). Endpoint alerts typically always have a host entity (which could be a physical or virtual machine).

Plot using Networkx/Matplotlib

# Draw the graph using Networkx/Matplotlib %matplotlib inline alertentity_graph = mas.create_alert_graph(security_alert) nbdisp.draw_alert_entity_graph(alertentity_graph, width=15)
Image in a Jupyter notebook

Contents

Related Alerts

For a subset of entities in the alert we can search for any alerts that have that entity in common. Currently this query looks for alerts that share the same Host, Account or Process and lists them below. Notes:

  • Some alert types do not include all of these entity types.

  • The original alert will be included in the "Related Alerts" set if it occurs within the query time boundary set below.

The query time boundaries default to a longer period than when searching for the alert. You can extend the time boundary searched before or after the alert time. If the widget doesn't support the time boundary that you want you can change the max_before and max_after parameters in the call to QueryTime below to extend the possible time boundaries.

# set the origin time to the time of our alert query_times = mas.QueryTime(units='day', origin_time=security_alert.TimeGenerated, max_before=28, max_after=1, before=5) query_times.display()
HTML(value='<h4>Set query time boundaries</h4>')
HBox(children=(DatePicker(value=datetime.date(2019, 2, 13), description='Origin Date'), Text(value='22:04:16',…
VBox(children=(IntRangeSlider(value=(-5, 1), description='Time Range (day):', layout=Layout(width='80%'), max=…
related_alerts = qry.list_related_alerts(provs=[query_times, security_alert]) if related_alerts is not None and not related_alerts.empty: host_alert_items = related_alerts\ .query('host_match == @True')[['AlertType', 'StartTimeUtc']]\ .groupby('AlertType').StartTimeUtc.agg('count').to_dict() acct_alert_items = related_alerts\ .query('acct_match == @True')[['AlertType', 'StartTimeUtc']]\ .groupby('AlertType').StartTimeUtc.agg('count').to_dict() proc_alert_items = related_alerts\ .query('proc_match == @True')[['AlertType', 'StartTimeUtc']]\ .groupby('AlertType').StartTimeUtc.agg('count').to_dict() def print_related_alerts(alertDict, entityType, entityName): if len(alertDict) > 0: print('Found {} different alert types related to this {} (\'{}\')' .format(len(alertDict), entityType, entityName)) for (k,v) in alertDict.items(): print(' {}, Count of alerts: {}'.format(k, v)) else: print('No alerts for {} entity \'{}\''.format(entityType, entityName)) print_related_alerts(host_alert_items, 'host', security_alert.hostname) print_related_alerts(acct_alert_items, 'account', security_alert.primary_account.qualified_name if security_alert.primary_account else None) print_related_alerts(proc_alert_items, 'process', security_alert.primary_process.ProcessFilePath if security_alert.primary_process else None) nbdisp.display_timeline(data=related_alerts, source_columns = ['AlertName'], title='Alerts', height=100) else: display(Markdown('No related alerts found.'))
Found 8 different alert types related to this host ('msticalertswin1') Detected potentially suspicious use of Telegram tool, Count of alerts: 2 Detected the disabling of critical services, Count of alerts: 2 Digital currency mining related behavior detected, Count of alerts: 2 Potential attempt to bypass AppLocker detected, Count of alerts: 4 Security incident detected, Count of alerts: 2 Security incident with shared process detected, Count of alerts: 3 Suspicious system process executed, Count of alerts: 2 Suspiciously named process detected, Count of alerts: 2 Found 13 different alert types related to this account ('msticalertswin1\msticadmin') An history file has been cleared, Count of alerts: 12 Azure Security Center test alert (not a threat), Count of alerts: 13 Detected potentially suspicious use of Telegram tool, Count of alerts: 2 Detected the disabling of critical services, Count of alerts: 2 Digital currency mining related behavior detected, Count of alerts: 2 New SSH key added, Count of alerts: 13 Possible credential access tool detected, Count of alerts: 11 Possible suspicious scheduling tasks access detected, Count of alerts: 1 Potential attempt to bypass AppLocker detected, Count of alerts: 3 Suspicious Download Then Run Activity, Count of alerts: 13 Suspicious binary detected, Count of alerts: 13 Suspicious system process executed, Count of alerts: 2 Suspiciously named process detected, Count of alerts: 2 Found 2 different alert types related to this process ('c:\w!ndows\system32\suchost.exe') Digital currency mining related behavior detected, Count of alerts: 2 Suspiciously named process detected, Count of alerts: 2
MIME type unknown not supported
MIME type unknown not supported

This should indicate which entities the other alerts are related to.

This can be unreadable with a lot of alerts. Use the matplotlib interactive zoom control to zoom in to part of the graph.

# Draw a graph of this (add to entity graph) %matplotlib notebook %matplotlib inline if related_alerts is not None and not related_alerts.empty: rel_alert_graph = mas.add_related_alerts(related_alerts=related_alerts, alertgraph=alertentity_graph) nbdisp.draw_alert_entity_graph(rel_alert_graph, width=15) else: display(Markdown('No related alerts found.'))

Select an Alert to view details.

If you want to investigate that alert - copy its SystemAlertId property and open a new instance of this notebook to investigate this alert.

def disp_full_alert(alert): global related_alert related_alert = mas.SecurityAlert(alert) nbdisp.display_alert(related_alert, show_entities=True) if related_alerts is not None and not related_alerts.empty: related_alerts['CompromisedEntity'] = related_alerts['Computer'] print('Selected alert is available as \'related_alert\' variable.') rel_alert_select = mas.SelectAlert(alerts=related_alerts, action=disp_full_alert) rel_alert_select.display() else: display(Markdown('No related alerts found.'))
Selected alert is available as 'related_alert' variable.
VBox(children=(Text(value='', description='Filter alerts by title:', style=DescriptionStyle(description_width=…

Contents

Get Process Tree

If the alert has a process entity this section tries to retrieve the entire process tree to which that process belongs.

Notes:

  • The alert must have a process entity

  • Only processes started within the query time boundary will be included

  • Ancestor and descented processes are retrieved to two levels (i.e. the parent and grandparent of the alert process plus any child and grandchild processes).

  • Sibling processes are the processes that share the same parent as the alert process

  • This can be a long-running query, especially if a wide time window is used! Caveat Emptor!

The source (alert) process is shown in red.

What's shown for each process:

  • Each process line is indented according to its position in the tree hierarchy

  • Top line fields:

    • [relationship to source process:lev# - where # is the hops away from the source process]

    • Process creation date-time (UTC)

    • Process Image path

    • PID - Process Id

    • SubjSess - the session Id of the process spawning the new process

    • TargSess - the new session Id if the process is launched in another context/session. If 0/0x0 then the process is launched in the same session as its parent

  • Second line fields:

    • Process command line

    • Account - name of the account context in which the process is running

# set the origin time to the time of our alert query_times = mas.QueryTime(units='minute', origin_time=security_alert.origin_time) query_times.display()
HTML(value='<h4>Set query time boundaries</h4>')
HBox(children=(DatePicker(value=datetime.date(2019, 2, 13), description='Origin Date'), Text(value='22:04:16',…
VBox(children=(IntRangeSlider(value=(-60, 10), description='Time Range (min):', layout=Layout(width='80%'), mi…
from msticpy.nbtools.query_defns import DataFamily if security_alert.data_family != DataFamily.WindowsSecurity: raise ValueError('The remainder of this notebook currently only supports Windows. ' 'Linux support is in development but not yet implemented.') def extract_missing_pid(security_alert): for pid_ext_name in ['Process Id', 'Suspicious Process Id']: pid = security_alert.ExtendedProperties.get(pid_ext_name, None) if pid: return pid def extract_missing_sess_id(security_alert): sess_id = security_alert.ExtendedProperties.get('Account Session Id', None) if sess_id: return sess_id for session in [e for e in security_alert.entities if e['Type'] == 'host-logon-session' or e['Type'] == 'hostlogonsession']: return session['SessionId'] if (security_alert.primary_process): # Do some patching up if the process entity doesn't have a PID pid = security_alert.primary_process.ProcessId if not pid: pid = extract_missing_pid(security_alert) if pid: security_alert.primary_process.ProcessId = pid else: raise ValueError('Could not find the process Id for the alert process.') # Do the same if we can't find the account logon ID if not security_alert.get_logon_id(): sess_id = extract_missing_sess_id(security_alert) if sess_id and security_alert.primary_account: security_alert.primary_account.LogonId = sess_id else: raise ValueError('Could not find the session Id for the alert process.') # run the query process_tree = qry.get_process_tree(provs=[query_times, security_alert]) if len(process_tree) > 0: # Print out the text view of the process tree nbdisp.display_process_tree(process_tree) else: display(Markdown('No processes were returned so cannot obtain a process tree.' '\n\nSkip to [Other Processes](#process_clustering) later in the' ' notebook to retrieve all processes')) else: display(Markdown('This alert has no process entity so cannot obtain a process tree.' '\n\nSkip to [Other Processes](#process_clustering) later in the' ' notebook to retrieve all processes')) process_tree = None

Contents

Process TimeLine

This shows each process in the process tree on a timeline view.

Labelling of individual process is very performance intensive and often results in nothing being displayed at all! Besides, for large numbers of processes it would likely result in an unreadable mess.

Your main tools for negotiating the timeline are the Hover tool (toggled on and off by the speech bubble icon) and the wheel-zoom and pan tools (the former is an icon with an elipse and a magnifying glass, the latter is the crossed-arrows icon). The wheel zoom is particularly useful.

As you hover over each process it will display the image name, PID and commandline.

Also shown on the graphic is the timestamp line of the source/alert process.

# Show timeline of events if process_tree is not None and not process_tree.empty: nbdisp.display_timeline(data=process_tree, alert=security_alert, title='Alert Process Session', height=250)
MIME type unknown not supported
Alert start time = 2019-02-13 22:03:42
MIME type unknown not supported

Contents

Other Processes on Host - Clustering

Sometimes you don't have a source process to work with. Other times it's just useful to see what else is going on on the host. This section retrieves all processes on the host within the time bounds set in the query times widget.

You can display the raw output of this by looking at the processes_on_host dataframe. Just copy this into a new cell and hit Ctrl-Enter.

Usually though, the results return a lot of very repetitive and unintersting system processes so we attempt to cluster these to make the view easier to negotiate. To do this we process the raw event list output to extract a few features that render strings (such as commandline)into numerical values. The default below uses the following features:

  • commandLineTokensFull - this is a count of common delimiters in the commandline (given by this regex r'[\s-\/.,"'|&:;%$()]'). The aim of this is to capture the commandline structure while ignoring variations on what is essentially the same pattern (e.g. temporary path GUIDs, target IP or host names, etc.)

  • pathScore - this sums the ordinal (character) value of each character in the path (so /bin/bash and /bin/bosh would have similar scores).

  • isSystemSession - 1 if this is a root/system session, 0 if anything else.

Then we run a clustering algorithm (DBScan in this case) on the process list. The result groups similar (noisy) processes together and leaves unique process patterns as single-member clusters.

Clustered Processes (i.e. processes that have a cluster size > 1)

from msticpy.sectools.eventcluster import dbcluster_events, add_process_features processes_on_host = qry.list_processes(provs=[query_times, security_alert]) if processes_on_host is not None and not processes_on_host.empty: feature_procs = add_process_features(input_frame=processes_on_host, path_separator=security_alert.path_separator) # you might need to play around with the max_cluster_distance parameter. # decreasing this gives more clusters. (clus_events, dbcluster, x_data) = dbcluster_events(data=feature_procs, cluster_columns=['commandlineTokensFull', 'pathScore', 'isSystemSession'], max_cluster_distance=0.0001) print('Number of input events:', len(feature_procs)) print('Number of clustered events:', len(clus_events)) clus_events[['ClusterSize', 'processName']][clus_events['ClusterSize'] > 1].plot.bar(x='processName', title='Process names with Cluster > 1', figsize=(12,3)); else: display(Markdown('Unable to obtain any processes for this host. This feature' ' is currently only supported for Windows hosts.' '\n\nIf this is a Windows host skip to [Host Logons](#host_logons)' ' later in the notebook to examine logon events.'))
Number of input events: 190 Number of clustered events: 24
Image in a Jupyter notebook

Variability in Command Lines and Process Names

The top chart shows the variability of command line content for a give process name. The wider the box, the more instances were found with different command line structure

Note, the 'structure' in this case is measured by the number of tokens or delimiters in the command line and does not look at content differences. This is done so that commonly varying instances of the same command line are grouped together.
For example updatepatch host1.mydom.com and updatepatch host2.mydom.com will be grouped together.

The second chart shows the variability in executable path. This does compare content so c:\windows\system32\net.exe and e:\windows\system32\net.exe are treated as distinct. You would normally not expect to see any variability in this chart unless you have multiple copies of the same name executable or an executable is trying masquerade as another well-known binary.

# Looking at the variability of commandlines and process image paths import seaborn as sns sns.set(style="darkgrid") if processes_on_host is not None and not processes_on_host.empty: proc_plot = sns.catplot(y="processName", x="commandlineTokensFull", data=feature_procs.sort_values('processName'), kind='box', height=10) proc_plot.fig.suptitle('Variability of Commandline Tokens', x=1, y=1) proc_plot = sns.catplot(y="processName", x="pathLogScore", data=feature_procs.sort_values('processName'), kind='box', height=10, hue='isSystemSession') proc_plot.fig.suptitle('Variability of Path', x=1, y=1);
Image in a Jupyter notebookImage in a Jupyter notebook

The top graph shows that, for a given process, some have a wide variability in their command line content while the majority have little or none. Looking at a couple of examples - like cmd.exe, powershell.exe, reg.exe, net.exe - we can recognize several common command line tools.

The second graph shows processes by full process path content. We wouldn't normally expect to see variation here - as is the cast with most. There is also quite a lot of variance in the score making it a useful proxy feature for unique path name (this means that proc1.exe and proc2.exe that have the same commandline score won't get collapsed into the same cluster).

Any process with a spread of values here means that we are seeing the same process name (but not necessarily the same file) is being run from different locations.

if not clus_events.empty: resp = input('View the clustered data? y/n') if resp == 'y': display(clus_events.sort_values('TimeGenerated')[['TimeGenerated', 'LastEventTime', 'NewProcessName', 'CommandLine', 'ClusterSize', 'commandlineTokensFull', 'pathScore', 'isSystemSession']])
View the clustered data? y/ny
# Look at clusters for individual process names def view_cluster(exe_name): display(clus_events[['ClusterSize', 'processName', 'CommandLine', 'ClusterId']][clus_events['processName'] == exe_name]) display(Markdown('You can view the cluster members for individual processes' 'by inserting a new cell and entering:<br>' '`>>> view_cluster(process_name)`<br></div>' 'where process_name is the unqualified process binary. E.g<br>' '`>>> view_cluster(\'reg.exe\')`'))

You can view the cluster members for individual processesby inserting a new cell and entering:
>>> view_cluster(process_name)
where process_name is the unqualified process binary. E.g
>>> view_cluster('reg.exe')

Time showing clustered vs. original data

# Show timeline of events - clustered events if not clus_events.empty: nbdisp.display_timeline(data=clus_events, overlay_data=processes_on_host, alert=security_alert, title='Distinct Host Processes (top) and All Proceses (bottom)')
MIME type unknown not supported
Alert start time = 2019-02-13 22:03:42
MIME type unknown not supported

Contents

Base64 Decode and Check for IOCs

This section looks for Indicators of Compromise (IoC) within the data sets passed to it.

The first section looks at the commandline for the alert process (if any). It also looks for base64 encoded strings within the data - this is a common way of hiding attacker intent. It attempts to decode any strings that look like base64. Additionally, if the base64 decode operation returns any items that look like a base64 encoded string or file, a gzipped binary sequence, a zipped or tar archive, it will attempt to extract the contents before searching for potentially interesting items.

process = security_alert.primary_process ioc_extractor = sectools.IoCExtract() if process: # if nothing is decoded this just returns the input string unchanged base64_dec_str, _ = sectools.b64.unpack_items(input_string=process["CommandLine"]) if base64_dec_str and '<decoded' in base64_dec_str: print('Base64 encoded items found.') print(base64_dec_str) # any IoCs in the string? iocs_found = ioc_extractor.extract(base64_dec_str) if iocs_found: print('\nPotential IoCs found in alert process:') display(iocs_found) else: print('Nothing to process')
Potential IoCs found in alert process:
defaultdict(set, {'windows_path': {'.\\suchost.exe'}})

If we have a process tree, look for IoCs in the whole data set

You can replace the data=process_tree parameter to ioc_extractor.extract() to pass other data frames. use the columns parameter to specify which column or columns that you want to search.

ioc_extractor = sectools.IoCExtract() try: if not process_tree.empty: source_processes = process_tree else: source_processes = clus_events except NameError: source_processes = None if source_processes is not None: ioc_df = ioc_extractor.extract(data=source_processes, columns=['CommandLine'], os_family=security_alert.os_family, ioc_types=['ipv4', 'ipv6', 'dns', 'url', 'md5_hash', 'sha1_hash', 'sha256_hash']) if len(ioc_df): display(HTML("<h3>IoC patterns found in process tree.</h3>")) display(ioc_df) else: ioc_df = None

If any Base64 encoded strings, decode and search for IoCs in the results.

For simple strings the Base64 decoded output is straightforward. However for nested encodings this can get a little complex and difficult to represent in a tabular format.

Columns

  • reference - The index of the row item in dotted notation in depth.seq pairs (e.g. 1.2.2.3 would be the 3 item at depth 3 that is a child of the 2nd item found at depth 1). This may not always be an accurate notation - it is mainly use to allow you to associate an individual row with the reference value contained in the full_decoded_string column of the topmost item).

  • original_string - the original string before decoding.

  • file_name - filename, if any (only if this is an item in zip or tar file).

  • file_type - a guess at the file type (this is currently elementary and only includes a few file types).

  • input_bytes - the decoded bytes as a Python bytes string.

  • decoded_string - the decoded string if it can be decoded as a UTF-8 or UTF-16 string. Note: binary sequences may often successfully decode as UTF-16 strings but, in these cases, the decodings are meaningless.

  • encoding_type - encoding type (UTF-8 or UTF-16) if a decoding was possible, otherwise 'binary'.

  • file_hashes - collection of file hashes for any decoded item.

  • md5 - md5 hash as a separate column.

  • sha1 - sha1 hash as a separate column.

  • sha256 - sha256 hash as a separate column.

  • printable_bytes - printable version of input_bytes as a string of \xNN values

  • src_index - the index of the row in the input dataframe from which the data came.

  • full_decoded_string - the full decoded string with any decoded replacements. This is only really useful for top-level items, since nested items will only show the 'full' string representing the child fragment.

if source_processes is not None: dec_df = sectools.b64.unpack_items(data=source_processes, column='CommandLine') if source_processes is not None and not dec_df.empty: display(HTML("<h3>Decoded base 64 command lines</h3>")) display(HTML("Warning - some binary patterns may be decodable as unicode strings")) display(dec_df[['full_decoded_string', 'original_string', 'decoded_string', 'input_bytes', 'file_hashes']]) ioc_dec_df = ioc_extractor.extract(data=dec_df, columns=['full_decoded_string']) if len(ioc_dec_df): display(HTML("<h3>IoC patterns found in base 64 decoded data</h3>")) display(ioc_dec_df) if ioc_df is not None: ioc_df = ioc_df.append(ioc_dec_df ,ignore_index=True) else: ioc_df = ioc_dec_df else: print("No base64 encodings found.") ioc_df = None

Contents

Virus Total Lookup

This section uses the popular Virus Total service to check any recovered IoCs against VTs database.

To use this you need an API key from virus total, which you can obtain here: https://www.virustotal.com/.

Note that VT throttles requests for free API keys to 4/minute. If you are unable to process the entire data set, try splitting it and submitting smaller chunks.

Things to note:

  • Virus Total lookups include file hashes, domains, IP addresses and URLs.

  • The returned data is slightly different depending on the input type

  • The VTLookup class tries to screen input data to prevent pointless lookups. E.g.:

    • Only public IP Addresses will be submitted (no loopback, private address space, etc.)

    • URLs with only local (unqualified) host parts will not be submitted.

    • Domain names that are unqualified will not be submitted.

    • Hash-like strings (e.g 'AAAAAAAAAAAAAAAAAA') that do not appear to have enough entropy to be a hash will not be submitted.

Output Columns

  • Observable - The IoC observable submitted

  • IoCType - the IoC type

  • Status - the status of the submission request

  • ResponseCode - the VT response code

  • RawResponse - the entire raw json response

  • Resource - VT Resource

  • SourceIndex - The index of the Observable in the source DataFrame. You can use this to rejoin to your original data.

  • VerboseMsg - VT Verbose Message

  • ScanId - VT Scan ID if any

  • Permalink - VT Permanent URL describing the resource

  • Positives - If this is not zero, it indicates the number of malicious reports that VT holds for this observable.

  • MD5 - The MD5 hash, if any

  • SHA1 - The MD5 hash, if any

  • SHA256 - The MD5 hash, if any

  • ResolvedDomains - In the case of IP Addresses, this contains a list of all domains that resolve to this IP address

  • ResolvedIPs - In the case Domains, this contains a list of all IP addresses resolved from the domain.

  • DetectedUrls - Any malicious URLs associated with the observable.

vt_key = mas.GetEnvironmentKey(env_var='VT_API_KEY', help_str='To obtain an API key sign up here https://www.virustotal.com/', prompt='Virus Total API key:') vt_key.display()
HBox(children=(Text(value='', description='Vir…
if vt_key.value and ioc_df is not None and not ioc_df.empty: vt_lookup = sectools.VTLookup(vt_key.value, verbosity=2) print(f'{len(ioc_df)} items in input frame') supported_counts = {} for ioc_type in vt_lookup.supported_ioc_types: supported_counts[ioc_type] = len(ioc_df[ioc_df['IoCType'] == ioc_type]) print('Items in each category to be submitted to VirusTotal') print('(Note: items have pre-filtering to remove obvious erroneous ' 'data and false positives, such as private IPaddresses)') print(supported_counts) print('-' * 80) vt_results = vt_lookup.lookup_iocs(data=ioc_df, type_col='IoCType', src_col='Observable') pos_vt_results = vt_results.query('Positives > 0') if len(pos_vt_results) > 0: display(HTML(f'<h3>{len(pos_vt_results)} Positive Results Found</h3>')) display(pos_vt_results[['Observable', 'IoCType','Permalink', 'ResolvedDomains', 'ResolvedIPs', 'DetectedUrls', 'RawResponse']]) display(HTML('<h3>Other results</h3>')) display(vt_results.query('Status == "Success"'))
5 items in input frame Items in each category to be submitted to VirusTotal (Note: items have pre-filtering to remove obvious erroneous data and false positives, such as private IPaddresses) {'ipv4': 0, 'dns': 2, 'url': 2, 'md5_hash': 0, 'sha1_hash': 0, 'sh256_hash': 0} -------------------------------------------------------------------------------- Invalid observable format: "wh401k.org", type "dns", status: Observable does not match expected pattern for dns - skipping. (Source index 4) Invalid observable format: "wh401k.org", type "dns", status: Observable does not match expected pattern for dns - skipping. (Source index 0) Submitting observables: "http://wh401k.org/getps"", type "url" to VT. (Source index 4) Error in response submitting observables: "http://wh401k.org/getps"", type "url" http status is 403. Response: None (Source index 4) Submitting observables: "http://wh401k.org/getps"</decoded>", type "url" to VT. (Source index 0) Error in response submitting observables: "http://wh401k.org/getps"</decoded>", type "url" http status is 403. Response: None (Source index 0) Submission complete. 4 responses from 5 input rows

To view the raw response for a specific row.

import json row_idx = 0 # The row number from one of the above dataframes raw_response = json.loads(pos_vt_results['RawResponse'].loc[row_idx]) raw_response

Contents

Alert command line - Occurrence on other hosts in workspace

To get a sense of whether the alert process is something that is occuring on other hosts, run this section.

This might tell you that the alerted process is actually a commonly-run process and the alert is a false positive. Alternatively, it may tell you that a real infection or attack is happening on other hosts in your environment.

# set the origin time to the time of our alert query_times = mas.QueryTime(units='day', before=5, max_before=20, after=1, max_after=10, origin_time=security_alert.origin_time) query_times.display()
HTML(value='<h4>Set query time boundaries</h4>')
HBox(children=(DatePicker(value=datetime.date(2019, 2, 13), description='Origin Date'), Text(value='22:04:16',…
VBox(children=(IntRangeSlider(value=(-5, 1), description='Time Range (day):', layout=Layout(width='80%'), max=…
# API ILLUSTRATION - Find the query to use qry.list_queries()
['list_alerts_counts', 'list_alerts', 'get_alert', 'list_related_alerts', 'list_related_ip_alerts', 'get_process_tree', 'list_processes', 'get_process_parent', 'list_hosts_matching_commandline', 'list_processes_in_session', 'get_host_logon', 'list_host_logons', 'list_host_logon_failures']
# API ILLUSTRATION - What does the query look like? qry.query_help('list_hosts_matching_commandline')
Query: list_hosts_matching_commandline Retrieves processes on other hosts with matching commandline Designed to be executed with data_source: process_create Supported data families: DataFamily.WindowsSecurity, DataFamily.LinuxSecurity Supported data environments: DataEnvironment.LogAnalytics Query parameters: ['add_query_items', 'subscription_filter', 'process_name', 'start', 'end', 'host_filter_neq', 'commandline'] Optional parameters: add_query_items Query: {table} {query_project} | where {subscription_filter} | where {host_filter_neq} | where TimeGenerated >= datetime({start}) | where TimeGenerated <= datetime({end}) | where NewProcessName endswith '{process_name}' | where CommandLine =~ '{commandline}' {add_query_items}
# This query needs a commandline parameter which isn't supplied # by default from the the alert # - so extract and escape this from the process if not security_alert.primary_process: raise ValueError('This alert has no process entity. This section is not applicable.') proc_match_in_ws = None commandline = security_alert.primary_process.CommandLine commandline = mas.utility.escape_windows_path(commandline) if commandline.strip(): proc_match_in_ws = qry.list_hosts_matching_commandline(provs=[query_times, security_alert], commandline=commandline) else: print('process has empty commandline') # Check the results if proc_match_in_ws is None or proc_match_in_ws.empty: print('No proceses with matching commandline found in on other hosts in workspace') print('between', query_times.start, 'and', query_times.end) else: hosts = proc_match_in_ws['Computer'].drop_duplicates().shape[0] processes = proc_match_in_ws.shape[0] print('{numprocesses} proceses with matching commandline found on {numhosts} hosts in workspace'\ .format(numprocesses=processes, numhosts=hosts)) print('between', query_times.start, 'and', query_times.end) print('To examine these execute the dataframe \'{}\' in a new cell'.format('proc_match_in_ws')) print(proc_match_in_ws[['TimeCreatedUtc','Computer', 'NewProcessName', 'CommandLine']].head())
No proceses with matching commandline found in on other hosts in workspace between 2019-02-08 22:04:16 and 2019-02-14 22:04:16

Contents

Host Logons

This section retrieves the logon events on the host in the alert.

You may want to use the query times to search over a broader range than the default.

# set the origin time to the time of our alert query_times = mas.QueryTime(units='day', origin_time=security_alert.origin_time, before=1, after=0, max_before=20, max_after=1) query_times.display()
HTML(value='<h4>Set query time boundaries</h4>')
HBox(children=(DatePicker(value=datetime.date(2019, 2, 13), description='Origin Date'), Text(value='22:04:16',…
VBox(children=(IntRangeSlider(value=(-1, 0), description='Time Range (day):', layout=Layout(width='80%'), max=…

Contents

Alert Logon Account

The logon associated with the process in the alert.

logon_id = security_alert.get_logon_id() if logon_id: if logon_id in ['0x3e7', '0X3E7', '-1', -1]: print('Cannot retrieve single logon event for system logon id ' '- please continue with All Host Logons below.') else: logon_event = qry.get_host_logon(provs=[query_times, security_alert]) nbdisp.display_logon_data(logon_event, security_alert) else: print('No account entity in the source alert or the primary account had no logonId value set.')
### Account Logon Account: MSTICAdmin Account Domain: MSTICAlertsWin1 Logon Time: 2019-02-13 22:03:42.283000 Logon type: 4 (Batch) User Id/SID: S-1-5-21-996632719-2361334927-4038480536-500 SID S-1-5-21-996632719-2361334927-4038480536-500 is administrator SID S-1-5-21-996632719-2361334927-4038480536-500 is local machine or domain account Session id '0x1e821b5' Subject (source) account: WORKGROUP/MSTICAlertsWin1$ Logon process: Advapi Authentication: Negotiate Source IpAddress: - Source Host: MSTICAlertsWin1 Logon status:

All Host Logons

Since the number of logon events may be large and, in the case of system logons, very repetitive, we use clustering to try to identity logons with unique characteristics.

In this case we use the numeric score of the account name and the logon type (i.e. interactive, service, etc.). The results of the clustered logons are shown below along with a more detailed, readable printout of the logon event information. The data here will vary depending on whether this is a Windows or Linux host.

from msticpy.sectools.eventcluster import dbcluster_events, add_process_features, _string_score host_logons = qry.list_host_logons(provs=[query_times, security_alert]) if host_logons is not None and not host_logons.empty: logon_features = host_logons.copy() logon_features['AccountNum'] = host_logons.apply(lambda x: _string_score(x.Account), axis=1) logon_features['LogonHour'] = host_logons.apply(lambda x: x.TimeGenerated.hour, axis=1) # you might need to play around with the max_cluster_distance parameter. # decreasing this gives more clusters. (clus_logons, _, _) = dbcluster_events(data=logon_features, time_column='TimeGenerated', cluster_columns=['AccountNum', 'LogonType'], max_cluster_distance=0.0001) print('Number of input events:', len(host_logons)) print('Number of clustered events:', len(clus_logons)) print('\nDistinct host logon patterns:') display(clus_logons.sort_values('TimeGenerated')) else: print('No logon events found for host.')
Number of input events: 22 Number of clustered events: 3 Distinct host logon patterns:
# Display logon details nbdisp.display_logon_data(clus_logons, security_alert)
### Account Logon Account: MSTICAdmin Account Domain: MSTICAlertsWin1 Logon Time: 2019-02-13 22:03:42.283000 Logon type: 4 (Batch) User Id/SID: S-1-5-21-996632719-2361334927-4038480536-500 SID S-1-5-21-996632719-2361334927-4038480536-500 is administrator SID S-1-5-21-996632719-2361334927-4038480536-500 is local machine or domain account Session id '0x1e821b5' Subject (source) account: WORKGROUP/MSTICAlertsWin1$ Logon process: Advapi Authentication: Negotiate Source IpAddress: - Source Host: MSTICAlertsWin1 Logon status: ### Account Logon Account: SYSTEM Account Domain: NT AUTHORITY Logon Time: 2019-02-13 21:10:58.540000 Logon type: 5 (Service) User Id/SID: S-1-5-18 SID S-1-5-18 is LOCAL_SYSTEM Session id '0x3e7' System logon session Subject (source) account: WORKGROUP/MSTICAlertsWin1$ Logon process: Advapi Authentication: Negotiate Source IpAddress: - Source Host: - Logon status: ### Account Logon Account: DWM-2 Account Domain: Window Manager Logon Time: 2019-02-12 22:22:21.240000 Logon type: 2 (Interactive) User Id/SID: S-1-5-90-0-2 Session id '0x106b458' Subject (source) account: WORKGROUP/MSTICAlertsWin1$ Logon process: Advapi Authentication: Negotiate Source IpAddress: - Source Host: - Logon status:

Comparing All Logons with Clustered results relative to Alert time line

# Show timeline of events - all logons + clustered logons if host_logons is not None and not host_logons.empty: nbdisp.display_timeline(data=host_logons, overlay_data=clus_logons, alert=security_alert, source_columns=['Account', 'LogonType'], title='All Host Logons')
MIME type unknown not supported
Alert start time = 2019-02-13 22:03:42
MIME type unknown not supported

View Process Session and Logon Events in Timelines

This shows the timeline of the clustered logon events with the process tree obtained earlier. This allows you to get a sense of which logon was responsible for the process tree session whether any additional logons (e.g. creating a process as another user) might be associated with the alert timeline.

Note you should use the pan and zoom tools to align the timelines since the data may be over different time ranges.

# Show timeline of events - all events if host_logons is not None and not host_logons.empty: nbdisp.display_timeline(data=clus_logons, source_columns=['Account', 'LogonType'], alert=security_alert, title='Clustered Host Logons', height=200) try: nbdisp.display_timeline(data=process_tree, alert=security_alert, title='Alert Process Session', height=200) except NameError: print('process_tree not available for this alert.')
MIME type unknown not supported
Alert start time = 2019-02-13 22:03:42
MIME type unknown not supported
MIME type unknown not supported
Alert start time = 2019-02-13 22:03:42
MIME type unknown not supported
# Counts of Logon types by Account if host_logons is not None and not host_logons.empty: display(host_logons[['Account', 'LogonType', 'TimeGenerated']] .groupby(['Account','LogonType']).count() .rename(columns={'TimeGenerated': 'LogonCount'}))

Contents

Failed Logons

failedLogons = qry.list_host_logon_failures(provs=[query_times, security_alert]) if failedLogons.shape[0] == 0: display(print('No logon failures recorded for this host between {security_alert.start} and {security_alert.start}')) failedLogons

Contents

Appendices

Available DataFrames

print('List of current DataFrames in Notebook') print('-' * 50) current_vars = list(locals().keys()) for var_name in current_vars: if isinstance(locals()[var_name], pd.DataFrame) and not var_name.startswith('_'): print(var_name)
List of current DataFrames in Notebook -------------------------------------------------- mydf alert_counts alert_list related_alerts process_tree processes_on_host feature_procs clus_events source_processes ioc_df dec_df ioc_dec_df vt_results pos_vt_results proc_match_in_ws logon_event host_logons logon_features clus_logons failedLogons

Saving Data to CSV

To save the contents of a pandas DataFrame to an CSV use the following syntax

host_logons.to_csv('host_logons.csv')

Saving Data to Excel

To save the contents of a pandas DataFrame to an Excel spreadsheet use the following syntax

writer = pd.ExcelWriter('myWorksheet.xlsx') my_data_frame.to_excel(writer,'Sheet1') writer.save()