GitHub Repository: Azure/Azure-Sentinel-Notebooks
Path: blob/master/tutorials-and-examples/deprecated-notebooks/Example - Step-by-Step Linux-Windows-Office Investigation.ipynb
³²⁵³ views

Kernel: Python 3

Title: Sample Hunting and Investigation in Jupyter

Linux, Windows, Network and Office data

Notebook Version: 1.0
[Platform Requirements[(#platform_reqs)

Description:

This is an example notebook demonstrating techniques to trace the path of an attacker in an organization. Most of the steps use relatively simple Log Analytics queries but it also includes a few more advanced procedures such as:

Unpacking and decoding Linux Audit logs
Clustering events to collapse repetitive items
Various visualizations

Technically, the narrative in this notebook is more an investigation than hunting, since the starting point is an alert rather than threat intelligence. However many of the techniques here - such as investigating process activity on Linux and Windows hosts, establishing intercommication via network analysis - are applicable to hunting scenarios as well.

The Investigation Narrative

From an initial alert (or suspect IP address) examine activity on a Linux host, a Windows and Office subscription. Discover malicious activity related to the ip address in each of these.

Warning: Example Notebook - Not for production use!

This notebooks is meant to be illustrative of specific scenarios and is not actively maintained.
It is unlikely to be runnable directly in your environment. Instead, please use the notebooks in the root of this repo.

Contents

Setup

Make sure that you have installed packages specified in the setup (uncomment the lines to execute)

Install Packages

If this is the first time running any of the Microsoft Sentinel notebooks you should run the ConfiguringNotebookEnvironment notebook before continuing with this notebook. If you are just viewing the notebook this is not necessary.

Import Packages

Once packages are installed run the next cell to import them.

In [1]:

# Imports
import sys
import warnings

MIN_REQ_PYTHON = (3,6)
if sys.version_info < MIN_REQ_PYTHON:
    print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')
    print('or later is selected as the active kernel.')
    sys.exit("Python %s.%s or later is required.\n" % MIN_REQ_PYTHON)

import numpy as np
from IPython import get_ipython
from IPython.display import display, HTML, Markdown
import ipywidgets as widgets

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import networkx as nx

import pandas as pd
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', 300)

import msticpy.sectools as sectools
import msticpy.nbtools as nbtools
import msticpy.nbtools.entityschema as entity
import msticpy.nbtools.kql as qry
import msticpy.nbtools.nbdisplay as nbdisp

# Some of our dependencies (networkx) still use deprecated Matplotlib
# APIs - we can't do anything about it so suppress them from view
from matplotlib import MatplotlibDeprecationWarning
warnings.simplefilter("ignore", category=MatplotlibDeprecationWarning)

WIDGET_DEFAULTS = {'layout': widgets.Layout(width='95%'),
                   'style': {'description_width': 'initial'}}
display(HTML(nbtools.util._TOGGLE_CODE_PREPARE_STR))

from collections import OrderedDict

# Create an observation collector list
from collections import namedtuple
Observation = namedtuple('Observation', ['caption', 'description', 'item', 'link'])
observation_list = OrderedDict()
def display_observation(observation):
    display(Markdown(f'### {observation.caption}'))
    display(Markdown(observation.description))
    display(Markdown(f'[Go to details](#{observation.link})'))
    display(observation.item)

def add_observation(observation):
    observation_list[observation.caption] = observation

Out[1]:

Contents

Part 1 - Threat Intel Report

Getting IoC IP Addresses

Threat intelligence is a vital tool in the amory of hunters and security investigators. Many companies subscribe to Threat Intelligence feeds from companies like Project Cymru, FireEye, Crowdstrike and others (Microsoft Sentinel customers can make these data subscriptions to enhance their alerts and queries used in Microsoft Sentinel). In other cases your threat intel may arrive via a CERT notification or a random tip-off of activity via email.

For the purposes of this notebook and a desire to make it more accessible to those who don't have ready access to a threat intelligence feed we're going to scrape some threat intelligence Indicators of Compromise (IoCs) from a public report.

Let's pick a recent report from FireEye WinRAR Zero-day Abused in Multiple Campaigns by Dileep Kumar Jallepalli.

The content of this report is not directly relevant to our investigation - we're just using this an example of something that you might receive and want to see if any of the Indicators of Compromise (IoC) listed in the report show up in your organization. I would stress that this is not a recommended way to consume threat intelligence data from FireEye or any other company and is only done here to provide a starting point for the notebook and accompanying blog. For one thing, you may be in violation of terms of service of the company and, for another, the threat intel listed in these types of reports represents a tiny fraction of the data that these companies provide as part of a commercial agreement.

In [2]:

# This report could equally be an email or some other text that you want to retrieve IoCs from.
import requests
url = 'https://www.fireeye.com/blog/threat-research/2019/03/winrar-zero-day-abused-in-multiple-campaigns.html'
response = requests.get(url)
if response.status_code != 200:
    print('Url request failed')

# We want to extract IoCs from this report
# FireEye list domains and ips with the final dot obfuscated - possibly to deter people from doing what I'm doing here.
# My apologies to FireEye in advance
ip_iocs = str(response.content).replace('[.]', '.')

We want to quickly extract relevant IoCs from reports and emails. Although this isn't the original purpose of this module we can use IoCExtract to help us do this.

In [3]:

from msticpy.sectools import IoCExtract
ioc_extr = IoCExtract()

help(IoCExtract)

Out[3]:

Help on class IoCExtract in module msticpy.sectools.iocextract:

class IoCExtract(builtins.object)
 |  IoC Extractor - looks for common IoC patterns in input strings.
 |  
 |  The extract() method takes either a string or a pandas DataFrame
 |  as input. When using the string option as an input extract will
 |  return a dictionary of results. When using a DataFrame the results
 |  will be returned as a new DataFrame with the following columns:
 |  IoCType: the mnemonic used to distinguish different IoC Types
 |  Observable: the actual value of the observable
 |  SourceIndex: the index of the row in the input DataFrame from
 |  which the source for the IoC observable was extracted.
 |  
 |  The class has a number of built-in IoC regex definitions.
 |  These can be retrieved using the ioc_types attribute.
 |  
 |  Addition IoC definitions can be added using the add_ioc_type
 |  method.
 |  
 |  Note: due to some ambiguity in the regular expression patterns
 |  for different types and observable may be returned assigned to
 |  multiple observable types. E.g. 192.168.0.1 is a also a legal file
 |  name in both Linux and Windows. Linux file names have a particularly
 |  large scope in terms of legal characters so it will be quite common
 |  to see other IoC observables (or parts of them) returned as a
 |  possible linux path.
 |  
 |  Methods defined here:
 |  
 |  __init__(self)
 |      Intialize new instance of IoCExtract.
 |  
 |  add_ioc_type(self, ioc_type: str, ioc_regex: str, priority: int = 0, group: str = None)
 |      Add an IoC type and regular expression to use to the built-in set.
 |      
 |      Parameters
 |      ----------
 |      ioc_type : str
 |          A unique name for the IoC type
 |      ioc_regex : str
 |          A regular expression used to search for the type
 |      priority : int, optional
 |          Priority of the regex match vs. other ioc_patterns. 0 is
 |          the highest priority (the default is 0).
 |      group : str, optional
 |          The regex group to match (the default is None,
 |          which will match on the whole expression)
 |      
 |      Notes
 |      -----
 |      Pattern priorities.
 |          If two IocType patterns match on the same substring, the matched
 |          substring is assigned to the pattern/IocType with the highest
 |          priority. E.g. `foo.bar.com` will match types: `dns`, `windows_path`
 |          and `linux_path` but since `dns` has a higher priority, the expression
 |          is assigned to the `dns` matches.
 |  
 |  extract(self, src: str = None, data: pandas.core.frame.DataFrame = None, columns: List[str] = None, os_family='Windows', ioc_types: List[str] = None, include_paths: bool = False) -> Any
 |      Extract IoCs from either a string or pandas DataFrame.
 |      
 |      Parameters
 |      ----------
 |      src : str, optional
 |          source string in which to look for IoC patterns
 |          (the default is None)
 |      data : pd.DataFrame, optional
 |          input DataFrame from which to read source strings
 |          (the default is None)
 |      columns : list, optional
 |          The list of columns to use as source strings,
 |          if the `data` parameter is used. (the default is None)
 |      os_family : str, optional
 |          'Linux' or 'Windows' (the default is 'Windows'). This
 |          is used to toggle between Windows or Linux path matching.
 |      ioc_types : list, optional
 |          Restrict matching to just specified types.
 |          (default is all types)
 |      include_paths : bool, optional
 |          Whether to include path matches (which can be noisy)
 |          (the default is false - excludes 'windows_path'
 |          and 'linux_path'). If `ioc_types` is specified
 |          this parameter is ignored.
 |      
 |      Returns
 |      -------
 |      Any
 |          dict of found observables (if input is a string) or
 |          DataFrame of observables
 |      
 |      Notes
 |      -----
 |      Extract takes either a string or a pandas DataFrame as input.
 |      When using the string option as an input extract will
 |      return a dictionary of results.
 |      When using a DataFrame the results will be returned as a new
 |      DataFrame with the following columns:
 |      - IoCType: the mnemonic used to distinguish different IoC Types
 |      - Observable: the actual value of the observable
 |      - SourceIndex: the index of the row in the input DataFrame from
 |      which the source for the IoC observable was extracted.
 |      
 |      IoCType Pattern selection
 |      The default list is:  ['ipv4', 'ipv6', 'dns', 'url',
 |      'md5_hash', 'sha1_hash', 'sha256_hash'] plus any
 |      user-defined types.
 |      'windows_path', 'linux_path' are excluded unless `include_paths`
 |      is True or explicitly included in `ioc_paths`.
 |  
 |  validate(self, input_str: str, ioc_type: str) -> bool
 |      Check that `input_str` matches the regex for the specificed `ioc_type`.
 |      
 |      Parameters
 |      ----------
 |      input_str : str
 |          the string to test
 |      ioc_type : str
 |          the regex pattern to use
 |      
 |      Returns
 |      -------
 |      bool
 |          True if match.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ioc_types
 |      Return the current set of IoC types and regular expressions.
 |      
 |      Returns
 |      -------
 |      dict
 |          dict of IoC Type names and regular expressions
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  DNS_REGEX = r'((?=[a-z0-9-]{1,63}\.)[a-z0-9]+(-[a-z0-9]+)*\.){2,}[a-z]...
 |  
 |  IPV4_REGEX = r'(?P<ipaddress>(?:[0-9]{1,3}\.){3}[0-9]{1,3})'
 |  
 |  IPV6_REGEX = r'(?<![:.\w])(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}(?![:.\w])...
 |  
 |  LXPATH_REGEX = '(?P<root>/+||[.]+)\n            (?P<folder>/(?:[^...|\...
 |  
 |  MD5_REGEX = '(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{32})(?:$|[^A-Fa-f0...
 |  
 |  SHA1_REGEX = '(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{40})(?:$|[^A-Fa-f...
 |  
 |  SHA256_REGEX = '(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{64})(?:$|[^A-Fa...
 |  
 |  URL_REGEX = "\n            (?P<protocol>(https?|ftp|telnet|lda...nt>([...
 |  
 |  WINPATH_REGEX = '\n            (?P<root>[a-z]:|\\\\\\\\[a-z0-9_.$-]+||...
 |  
 |  __annotations__ = {'_content_regex': typing.Dict[str, msticpy.sectools...

In [4]:

iocs_found = ioc_extr.extract(src=ip_iocs, ioc_types=['ipv4', 'url', 'dns', 'md5_hash', 'sha1_hash'])
if 'ipv4' in iocs_found:
    print('IPs in report')
    for item in iocs_found['ipv4']:
        print(f'\t{item}')
if 'url' in iocs_found:
    print('Urls in report')
    for item in (url for url in iocs_found['url'] if 'fireeye' not in url.lower()):
        print(f'\t{item}')
if 'md5_hash' in iocs_found:
    print('MD5 Hashes in report')
    for item in iocs_found['md5_hash']:
        print(f'\t{item}')
if 'dns' in iocs_found:
    print('Domains in report')
    for item in (dns for dns in iocs_found['dns'] if 'fireeye' not in dns.lower()):
        print(f'\t{item}')

Out[4]:

IPs in report
	185.49.71.101
	47.91.56.21
	103.225.168.159
	31.148.220.53
	89.34.111.113
	185.162.131.92
Urls in report
	https://twitter.com/360TIC/status/1101022904156741632
	http://tiny-share.com/direct/7dae2d144dae4447a152bef586520ef8
	http://schema.org/ListItem
	http://schema.org/Person
	http://schema.org/BlogPosting
	https://www.win-rar.com/start.html
	https://schema.org/Brand
	http://schema.org/BreadcrumbList
	https://ti.360.net/blog/articles/upgrades-in-winrar-exploit-with-social-engineering-and-encryption/
	https://www.cswe.org/getattachment/Accreditation/Accreditation-Process/Candidacy-Eligibility-Application-Help-Document.pdf.aspx
	http://schema.org/WPHeader
	https://itunes.apple.com/us/podcast/eye-on-security/id1073779629?mt=2
	https://cloud.typography.com/6746836/6977592/css/fonts.css
	https://research.checkpoint.com/extracting-code-execution-from-winrar/
	http://schema.org/WebPage
MD5 Hashes in report
	2961C52F04B7FDF7CCF6C01AC259D767
	96986B18A8470F4020EA78DF0B3DB7D4
	3aabc9767d02c75ef44df6305bc6a41f
	79B53B4555C1FB39BA3C7B8CE9A4287E
	9b81b3174c9b699f594d725cf89ffaa4
	f36404fb24a640b40e2d43c72c18e66b
	7dae2d144dae4447a152bef586520ef8
	31718d7b9b3261688688bdc4e026db99
	062801f6fdbda4dd67b77834c62e82a4
	719d34d31c8e3a6e6fffd425f7e032f3
	9b19753369b6ed1187159b95fc8a81cd
	8e067e4cda99299b0bf2481cc1fd8e12
	12def981952667740eb06ee91168e643
	0f56b04a4e9a0df94c7f89c1bccf830c
	dc63d5affde0db95128dac52f9d19578
	914ac7ecf2557d5836f26a151c1b9b62
	119A0FD733BC1A013B0D4399112B8626
	1f5fa51ac9517d70f136e187d45f69de
	97D74671D0489071BAA21F38F456EB74
	eca09fe8dcbc9d1c097277f2b3ef1081
	49419d84076b13e96540fdd911f1c2f0
	1BA398B0A14328B9604EEB5EBF139B40
	8c93e024fc194f520e4e72e761c0942d
	1322340356018696d853e0ac6f7ce3a2
	BCC49643833A4D8545ED4145FB6FDFD2
	AAC00312A961E81C4AF4664C49B4A2B2
	e9815dfb90776ab449539a2be7c16de5
Domains in report
	www.facebook.com
	csrf.min.js
	fw.min.js
	c09.c09v1.has
	ti.360.net
	6si.min.js
	s7.addthis.com
	Heur.BZC.ONG.Boxter
	utils.min.js
	www.khuyay.org
	Trojan.Win.Azorult
	itunes.apple.com
	fdc.blog.replaceFormWithThankYou
	fw.min.css
	redesign-2018.min.css
	www.win-rar.com
	Exploit.ACE-PathTraversal.Gen
	www.alahbabgroup.com
	Analytics.ClientContextUtils.init
	s.className.replace
	tags.tiqcdn.com
	window.Granite.csrf
	Thumbs.db.lnk
	Candidacy-Eligibility-Application-Help-Document.pdf.aspx
	Trojan.Agent.DPAS
	window.location.href
	geoipResponse.country.iso
	a.parentNode.insertBefore
	Picture7.5.png
	forms2.min.js
	fdc.blog.initCheckboxes
	data.blogs.length
	String.prototype.indexOf.apply
	kernel.min.js
	j.6sc.co
	fdc.geoipResponse.country.iso
	jquery.min.js
	String.prototype.includes
	cloud.typography.com
	www.cswe.org
	Exploit.Agent.VA
	base.min.css
	elem.parentNode.insertBefore
	b.6sc.co
	fdc.geoipResponse.country
	www.youtube.com
	s.parentNode.insertBefore
	shared.min.js
	vnd.microsoft.icon
	nav.min.js
	Generic.MSIL.PasswordStealerA
	modern.min.js
	www.linkedin.com
	Exploit.Agent.UZ
	research.checkpoint.com
	Analytics.SegmentMgr.loadSegments
	granite.min.js
	window.location.pathname

At this point we're going to cheat a little (I said earlier that the FireEye report was not related to the notebook). We will take the list of IPs from the report and add in the IP Address of our fictonal attacker.

In [5]:

c2_ips = list(iocs_found['ipv4'])
c2_ips.append('23.97.60.214')
c2_ips

Out[5]:

['185.49.71.101',
 '47.91.56.21',
 '103.225.168.159',
 '31.148.220.53',
 '89.34.111.113',
 '185.162.131.92',
 '23.97.60.214']

Authenticate to Microsoft Sentinel

Get the Workspace ID

To find your Workspace Id go to Log Analytics. Look at the workspace properties to find the ID.

In [6]:

import os
from msticpy.nbtools.wsconfig import WorkspaceConfig
ws_config_file = 'config.json'

WORKSPACE_ID = None
TENANT_ID = None
try:
    ws_config = WorkspaceConfig(ws_config_file)
    display(Markdown(f'Read Workspace configuration from local config.json '
                     f'for workspace **{ws_config["workspace_name"]}**'))
    for cf_item in ['tenant_id', 'subscription_id', 'resource_group',
                    'workspace_id', 'workspace_name']:
        display(Markdown(f'**{cf_item.upper()}**: {ws_config[cf_item]}'))
                     
    if ('cookiecutter' not in ws_config['workspace_id'] or
            'cookiecutter' not in ws_config['tenant_id']):
        WORKSPACE_ID = ws_config['workspace_id']
        TENANT_ID = ws_config['tenant_id']
except:
    pass

if not WORKSPACE_ID or not TENANT_ID:
    display(Markdown('**Workspace configuration not found.**\n\n'
                     'Please go to your Log Analytics workspace, copy the workspace ID'
                     ' and/or tenant Id and paste here.<br> '
                     'Or read the workspace_id from the config.json '
                     'in your Azure Notebooks project.'))
    ws_config = None
    ws_id = nbtools.GetEnvironmentKey(env_var='WORKSPACE_ID',
                                      prompt='Please enter your Log Analytics Workspace Id:',
                                      auto_display=True)
    ten_id = nbtools.GetEnvironmentKey(env_var='TENANT_ID',
                                       prompt='Please enter your Log Analytics Tenant Id:',
                                       auto_display=True)

Out[6]:

Read Workspace configuration from local config.json for workspace {{cookiecutter.workspace_name}}

TENANT_ID: 72f988bf-86f1-41af-91ab-2d7cd011db47

SUBSCRIPTION_ID: {{cookiecutter.subscription_id}}

RESOURCE_GROUP: {{cookiecutter.resource_group}}

WORKSPACE_ID: 52b1ab41-869e-4138-9e40-2a4457f09bf0

WORKSPACE_NAME: {{cookiecutter.workspace_name}}

Authenticate to Log Analytics

If you are using user/device authentication, run the following cell.

Click the 'Copy code to clipboard and authenticate' button.
This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard.
Select the text box and paste (Ctrl-V/Cmd-V) the copied value.
You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.

Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:

%kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)

instead of

%kql loganalytics://code().workspace(WORKSPACE_ID)

Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
On successful authentication you should see a popup schema button.

In [7]:

if not WORKSPACE_ID or not TENANT_ID:
    try:
        WORKSPACE_ID = ws_id.value
        TENANT_ID = ten_id.value
    except NameError:
        raise ValueError('No workspace or Tenant Id.')

nbtools.kql.load_kql_magic()
%kql loganalytics://code().tenant(TENANT_ID).workspace(WORKSPACE_ID)

Out[7]:

Contents

Search for C2

Set Query Time Range

Specify a time range to search for alerts. One this is set run the following cell to retrieve any alerts in that time window. You can change the time range and re-run the queries until you find the alerts that you want.

In [11]:

from datetime import datetime
search_origin = datetime(2019, 2, 18)
search_q_times = nbtools.QueryTime(units='day', max_before=20,
                                   before=1, max_after=1, origin_time=search_origin)
search_q_times.display()

Out[11]:

HTML(value='<h4>Set query time boundaries</h4>')

HBox(children=(DatePicker(value=datetime.date(2019, 2, 18), description='Origin Date'), Text(value='00:00:00',…

VBox(children=(IntRangeSlider(value=(-1, 1), description='Time Range (day):', layout=Layout(width='80%'), max=…

In [15]:

# Let's query our Microsoft Sentinel data to see if any records 
# contain any of these IPs and print which tables we find them
# in.

query_template = '''
search "{ip_addr}"
| where TimeGenerated >= datetime({start})
| where TimeGenerated <= datetime({end})
| summarize count() by Type
'''

# Using search repeatedly like this is a bit inefficient - you can get a quick indicator 
# if there are any matches with the syntax:
# search "ipaddr1" or "ipaddr2" or ....
# but if there are matches this doesn't tell you which IP matched.
for ip in c2_ips:
    query = query_template.format(ip_addr=ip,
                                  start=search_q_times.start,
                                  end=search_q_times.end)
    df, result = qry.exec_query_string(query)
    print(f'Searching for ip {ip}...', end=' ')
    if df is not None and not df.empty:
        print(f'Found results for {ip}:')
        display(df)
    else:
        print('no matches found')

Out[15]:

Searching for ip 185.49.71.101... no matches found
Searching for ip 47.91.56.21... no matches found
Searching for ip 103.225.168.159... no matches found
Searching for ip 31.148.220.53... no matches found
Searching for ip 89.34.111.113... no matches found
Searching for ip 185.162.131.92... no matches found
Searching for ip 23.97.60.214... Found results for 23.97.60.214:

We can see that we have 14 alerts in that period that match the final IP in the list. Let's have a look at those.

In [24]:

alert_list = qry.list_alerts(provs=[search_q_times])
print(len(alert_counts), ' distinct alert types')
print(len(alert_list), ' distinct alerts')
display(HTML('<h2>Top alerts</h2>'))
display(alert_list[['AlertName', 'CompromisedEntity', 'TenantId']]
        .groupby(['AlertName', 'CompromisedEntity'])
        .count()
        .rename(columns={'TenantId':'Count'}))

Out[24]:

3  distinct alert types
19  distinct alerts

Contents

Examine an Alert

Pick an alert from a list of retrieved alerts.

This section extracts the alert information and entities into a SecurityAlert object allowing us to query the properties more reliably.

In particular, we use the alert to automatically provide parameters for queries and UI elements. Subsequent queries will use properties like the host name and derived properties such as the OS family (Linux or Windows) to adapt the query. Query time selectors like the one above will also default to an origin time that matches the alert selected.

The alert view below shows all of the main properties of the alert plus the extended property dictionary (if any) and JSON representations of the Entity.

Select alert from list

As you select an alert, the main properties will be shown below the list.

Use the filter box to narrow down your search to any substring in the AlertName.

In [25]:

security_alert = None
def show_full_alert(selected_alert):
    global security_alert
    security_alert = nbtools.SecurityAlert(alert_select.selected_alert)
    nbtools.disp.display_alert(security_alert, show_entities=True)
alert_select = nbtools.SelectAlert(alerts=alert_list, action=show_full_alert)
alert_select.display()

Out[25]:

VBox(children=(Text(value='', description='Filter alerts by title:', style=DescriptionStyle(description_width=…

Looking at the SSH Anomalous logons we can see our IP address as the origin IP. Looking at the one SuspiciousFileDownload alert, we can see (buried in the Process Entity) that the same IP Address was used as the host address from an http download.

Check alert for IP addresses not contained in entities

Additional IP addresses found in alert are shown below.

In [52]:

# We have the IP address already but we can use the same trick as before
# to pass the alert (squashed into a string) to the IoC extractor to fish out 
# anything interesting
ioc_extractor = sectools.IoCExtract()
new_ips = ioc_extractor.extract(src=str(security_alert), ioc_types=['ipv4', 'ipv6'])

alert_ip_entities = [entity.IpAddress(Address=ip) for ip in new_ips.get('ipv4', [])]
print('IPs in alert\n', alert_ip_entities)

c2_ip_entities = [entity.IpAddress(Address=ip) for ip in c2_ips if ip not in new_ips['ipv4']]
print('Remaining C2 IPs\n', c2_ip_entities)
# Since we didn't find any matches for the other IPs in the list
# we'll use the IPAddress entity that we just created for further investigation

Out[52]:

IPs in alert
 [{"Address": "23.97.60.214", "Type": "ipaddress"}]
Remaining C2 IPs
 [{"Address": "185.49.71.101", "Type": "ipaddress"}, {"Address": "47.91.56.21", "Type": "ipaddress"}, {"Address": "103.225.168.159", "Type": "ipaddress"}, {"Address": "31.148.220.53", "Type": "ipaddress"}, {"Address": "89.34.111.113", "Type": "ipaddress"}, {"Address": "185.162.131.92", "Type": "ipaddress"}]

Contents

Basic IP Checks

Reverse IP and WhoIs

In [29]:

# reverse DNS lookup
from dns import reversename, resolver
from ipwhois import IPWhois
for src_ip_entity in alert_ip_entities:
    print('IP:', src_ip_entity.Address)
    print('-'*50)
    
    print('Reverse Name Lookup.')
    rev_name = reversename.from_address(src_ip_entity.Address)
    
    print(rev_name)
    try:
        rev_dns = str(resolver.query(rev_name, 'PTR'))
        display(rev_dns)
    except:
        print('No reverse addr result')
        pass

    print('\nWhoIs Lookup.')
    whois = IPWhois(src_ip_entity.Address)
    whois_result = whois.lookup_whois()
    if whois_result:
        display(whois_result)
    else:
        print('No whois result')

Out[29]:

IP: 23.97.60.214
--------------------------------------------------
Reverse Name Lookup.
214.60.97.23.in-addr.arpa.
No reverse addr result

WhoIs Lookup.

{'nir': None,
 'asn_registry': 'arin',
 'asn': '8075',
 'asn_cidr': '23.96.0.0/14',
 'asn_country_code': 'US',
 'asn_date': '2013-06-18',
 'asn_description': 'MICROSOFT-CORP-MSN-AS-BLOCK - Microsoft Corporation, US',
 'query': '23.97.60.214',
 'nets': [{'cidr': '23.96.0.0/13',
   'name': 'MSFT',
   'handle': 'NET-23-96-0-0-1',
   'range': '23.96.0.0 - 23.103.255.255',
   'description': 'Microsoft Corporation',
   'country': 'US',
   'state': 'WA',
   'city': 'Redmond',
   'address': 'One Microsoft Way',
   'postal_code': '98052',
   'emails': ['[email protected]',
    '[email protected]',
    '[email protected]'],
   'created': '2013-06-18',
   'updated': '2013-06-18'}],
 'raw': None,
 'referral': None,
 'raw_referral': None}

Geo IP Lookup

Where does this communication come from?

In [41]:

from msticpy.sectools.geoip import GeoLiteLookup
iplocation = GeoLiteLookup()

for ip_entity in alert_ip_entities:
    if 'Location' not in ip_entity or not ip_entity.Location:
        iplocation.lookup_ip(ip_entity=ip_entity)
    print(ip_entity)

Out[41]:

{ 'Address': '23.97.60.214',
  'Location': { 'City': 'Singapore',
                'CountryCode': 'SG',
                'CountryName': 'Singapore',
                'Latitude': 1.2931,
                'Longitude': 103.8558,
                'State': 'Central Singapore Community Development Council',
                'Type': 'geolocation'},
  'Type': 'ipaddress'}

In [43]:

# Why not see it on a map? 
# Clicking on the icon gives you the detail of the IP Address location
from msticpy.nbtools.foliummap import FoliumMap
geo_map = FoliumMap()

geo_map.add_ip_cluster(ip_entities=alert_ip_entities, color='red')

# We can add the other C2 Ips
for ip_entity in c2_ip_entities:
    if 'Location' not in ip_entity or not ip_entity.Location:
        iplocation.lookup_ip(ip_entity=ip_entity)
geo_map.add_ip_cluster(ip_entities=c2_ip_entities, color='purple')
display(geo_map.folium_map)

Out[43]:

Contents

Threat Intel - Check the IP Address for known malicious addresses

Lookup in Microsoft Sentinel Bring-Your-Own-Threat-Intel

In [55]:

# Lookup in Sentinel Bring-Your-Own-Threat-Intel (or IPReputation/Blacklists)
# The TI Kql query - we're substituting the IP address to search for
ti_query = r'''
BYOThreatIntelv1_CL
| where NetworkIP_s == '{ip}'
| project TimeGenerated, ExternalIndicatorId_s, ThreatType_s,
Description_s, Active_s, TrafficLightProtocolLevel_s,
ConfidenceScore_s, ThreatSeverity_s, ExpirationDateTime_t,
IndicatorId_s, NetworkIP_s, Type
'''.format(ip=alert_ip_entities[0].Address)

# run the query, convert to a dataframe and display any result
%kql -query ti_query
ti_query_df = _kql_raw_result_.to_dataframe()
if len(ti_query_df) > 0:
    display(ti_query_df.T)

Out[55]:

Lookup in VirusTotal

In [45]:

# Get an API key for Virus Total
vt_key = nbtools.GetEnvironmentKey(env_var='VT_API_KEY',
                           help_str='To obtain an API key sign up here https://www.virustotal.com/',
                           prompt='Virus Total API key:')
vt_key.display()

Out[45]:

HBox(children=(Text(value='', description='Vir…

In [51]:

# Lookup the IP Addresses in Virus Total using the msticpy VTLookup class
vt_lookup = sectools.VTLookup(vt_key.value, verbosity=2)

# Let's look for our other C2 IPs - we don't expect our simulated attack 
# address to appear in VT.
# Note, because we're using a free VirusTotal API key here we're limited to
# 4 requests per minute so some requests may error out.
for ip in c2_ip_entities:
    vt_lookup.lookup_ioc(observable=ip.Address, ioc_type='ipv4')
vt_lookup.results.dropna(axis='columns')

Out[51]:

Error parsing response to JSON: "89.34.111.113", type "ipv4". (Source index 0)
Error parsing response to JSON: "185.162.131.92", type "ipv4". (Source index 0)

End of Part 1

We've seen:

how to search for IoCs across the different data sets in Microsoft Sentinel
how to use IoCExtract to pull out observables from arbitrary text
some of the UI helper widgets like query time setting, alert display to help with quickly assembling a useful notebook
how to use the GeoIP lookup and mapping tools
how to use the VirusTotal lookup to check IPs for known malware origins

In the next part we'll focus on one of the hosts that we already know has been communicating with one of the suspect IPs and see if we can confirm this to be a successful attack or not. We'll then go on to see what we can learn from network traffic recorded in some of the other data sets to see if the attack has spread beyond this single host.

Contents

Part 2 - See What's going on on the Affected Host - Linux

In the next two sections we will examine the host from where the alert originated. In this case it is a Linux host. While we can get some useful information from standard syslog, we have audit logging configured on our hosts to give us detailed process and logon events.

The only tricky part is that the data is not currently in a very friendly format.

This is a good example of using a combination of LogAnalytics/Kusto process, combined with some local python processing to extract data from arbitrary log types.

In [56]:

host1_q_times = nbtools.QueryTime(label='Set time bounds for alert host - at least 1hr either side of the alert',
                           units='hour', max_before=48, before=2, after=1, 
                           max_after=24, origin_time=security_alert.StartTimeUtc)
host1_q_times.display()

Out[56]:

HTML(value='<h4>Set time bounds for alert host - at least 1hr either side of the alert</h4>')

HBox(children=(DatePicker(value=datetime.date(2019, 2, 18), description='Origin Date'), Text(value='15:29:22',…

VBox(children=(IntRangeSlider(value=(-2, 1), description='Time Range (hour):', layout=Layout(width='80%'), max…

Contents

Using Linux Audit data to view processes

In [58]:

# First let's look at the raw log
# Scroll over to look at the RawData column contents
%kql AuditLog_CL | where RawData contains 'EXECVE' | take 3

Out[58]:

Linux Audit Logs - To Dos

There are a few things that we need to deal with here:

Splitting and unpacking the fields in each rawdata field
Some events (like process exec) have multiple rows associated with them - we need to join these together into a single row
Some string fields are hex-encoded (this is to allow embedded characters like spaces)
We need also to extract the timestamp from the msg field (this is stored as a Unix timestamp float)

In [59]:

# We use Kusto to as much of the heavy lifting as possible.
# This query splits the rawdata field into message type, message Id and timestamp and message data fields (lines 5 and 6)
# line 7 - get rid of unwanted columns
# line 8 - split the message body into an array of key=value strings
# line 9 - pack the message type and list of contents into a dictionary {'Type': [k1=v1, k2=v2...]}
# line 10 - group by messageId and pack the individual typed_mssg dictionaries into a list of dictionarys

linux_events = r'''
AuditLog_CL
| where Computer has '{hostname}'
| where TimeGenerated >= datetime({start})
| where TimeGenerated <= datetime({end})
| extend mssg_parts = extract_all(@"type=(?P<type>[^\s]+)\s+msg=audit\((?P<mssg_id>[^)]+)\):\s+(?P<mssg>[^\r]+)\r?", dynamic(['type', 'mssg_id', 'mssg']), RawData)
| extend mssg_type = tostring(mssg_parts[0][0]), mssg_id = tostring(mssg_parts[0][1])
| project TenantId, TimeGenerated, Computer, mssg_type, mssg_id, mssg_parts
| extend mssg_content = split(mssg_parts[0][2],' ')
| extend typed_mssg = pack(mssg_type, mssg_content)
| summarize AuditdMessage = makelist(typed_mssg) by TenantId, TimeGenerated, Computer, mssg_id
'''.format(start=host1_q_times.start, end=host1_q_times.end,
           hostname=security_alert.hostname)
print('getting data...')
%kql -query linux_events
linux_events_df = _kql_raw_result_.to_dataframe()
print(f'{len(linux_events_df)} raw auditd mssgs downloaded')

Out[59]:

getting data...
17413 raw auditd mssgs downloaded

In [60]:

# Look at a sample of the output
linux_events_df[['Computer', 'TimeGenerated', 'mssg_id', 'AuditdMessage']][0:10]

Out[60]:

In [61]:

# We still have some work to do using the auditdextract module from msticpy
# This will do
# - spliting the key=value string
# - the hex decoding of any encoded strings
# - type conversion for int fields
# - for SYSCALL/EXECVE rows we'll do some extract processing to identify the executable that ran
# and re-assemble the commandline arguments
# - extracting the real timestamp and replacing the original TimeGenerated columns (since this was
# just the log import time, not the event time, which is what we are after)
from msticpy.sectools.auditdextract import extract_events_to_df, get_event_subset
linux_events_all = extract_events_to_df(linux_events_df, verbose=True)

Out[61]:

Unpacking auditd messages for 17413 events...
Building output dataframe...
Fixing timestamps...
Complete. 17413 output rows time: 13.280467 sec

In [62]:

# Look at a sample - this isn't very clear. We'll see better below.
linux_events_all[0:5]

Out[62]:

Contents

Event Types collected

It's useful to get an overview of what events we are dealing with. The graph is dominated by SYSCALL_EXECVE (process exec events). We're displaying on a log scale otherwise the very low volume events would be invisible.

In [63]:

sns.set()
(linux_events_all[['EventType', 'TimeGenerated']]
     .groupby('EventType').count().rename(columns={'TimeGenerated': 'EventCount'})
     .sort_values('EventCount', ascending=True)
     .plot.barh(logx=True, figsize=(12,6)));

Out[63]:

View events by Type - Process (SYSCALL) and Login events are covered in more detail below. Use this to look at some of the rarer event types to see anything unusual.

In [64]:

# Lets look at the audit messages by type 
from ipywidgets import interactive

# We get the distinct list of event types
items = sorted(linux_events_all['EventType'].unique().tolist())

# this is a nice way of using a Select (list) widget to filter the display
# of the pandas dataframe. The interactive() call below tells the widget
# to call the view function each time an item is selected. The value of the
# item (EventType) is passed to the function and we use it to filter the DataFrame
# before displaying it.
def view(x=''):
    display(linux_events_all[linux_events_all['EventType']==x]
            .drop(['EventType', 'TenantId', 'Computer', 'mssg_id'], axis=1)
            .dropna(axis=1, how='all'))

w = widgets.Select(options=items, description='Select Event Type', **WIDGET_DEFAULTS)
interactive(view, x=w)

Out[64]:

interactive(children=(Select(description='Select Event Type', layout=Layout(width='95%'), options=('CRED_ACQ',…

Extract Individual Event Types for logon and process events

In [65]:

from msticpy.sectools.auditdextract import extract_events_to_df, get_event_subset
lx_proc_create = get_event_subset(linux_events_all,'SYSCALL_EXECVE')
print(f'{len(lx_proc_create)} Process Create Events')

lx_login = (get_event_subset(linux_events_all, 'LOGIN')
        .merge(get_event_subset(linux_events_all, 'CRED_ACQ'), 
               how='inner',
               left_on=['old-ses', 'pid', 'uid'], 
               right_on=['ses', 'pid', 'uid'],
               suffixes=('', '_cred')).drop(['old-ses','TenantId_cred', 
                                             'Computer_cred'], axis=1)
        .dropna(axis=1, how='all'))
print(f'{len(lx_login)} Login Events')

Out[65]:

13046 Process Create Events
269 Login Events

Contents

Failure Events

Can sometimes tell us about attempts to probe around the system that haven't quite worked. Login failures will show up here as well.

In [66]:

lx_fail_events = (linux_events_all[linux_events_all['res'] == "failed'"]
                    .drop(['TenantId', 'mssg_id'], axis=1)
                    .dropna(axis=1, how='all'))
if len(lx_fail_events) > 0:
    display(lx_fail_events)
    add_observation(Observation(caption='Failure events on Linux host.',
                               description='One or more failure events detected on host.',
                               item=lx_fail_events,
                               link='linux_failure_events'))

Contents

Extract IPs from all Events

In [67]:

# Search all events for addr field with an IPAddress (we're looking for any string with a '.'. 
# Drop duplicates and localhost and return list
events_with_ips = (linux_events_all[['EventType','addr']]
                   [linux_events_all['addr'].str.contains(r'\.', na=False)]
                   .drop_duplicates())

# Display any events found
display(events_with_ips)
# Get unique IPs and drop localhost
host_ext_ips = list(events_with_ips['addr'].drop_duplicates().to_dict().values())
if '127.0.0.1' in host_ext_ips:
    host_ext_ips.remove('127.0.0.1')
display(host_ext_ips)

Out[67]:

['23.97.60.214']

Contents

Get Logins with IP Address Recorded

In [68]:

# From the logon events that we separated out a few cells back
# we can get the full event details of logons with external IPs                  

logins_with_ips = (lx_login[lx_login['addr'] != '?']
                   [['Computer', 'TimeGenerated','pid', 'ses', 
                     'acct', 'addr', 'exe', 'hostname', 'msg',
                     'res_cred', 'ses_cred', 'terminal']])
if len(logins_with_ips) > 0:
    display(logins_with_ips)
    add_observation(Observation(caption='Login events with source Ip addresses',
                                description=f'{len(logins_with_ips)} logins with external addresses',
                                item=logins_with_ips,
                                link='linux_login_ips'))

Out[68]:

Contents

What's happening in these sessions?

If there are a lot of events here try the Process Clustering section below.

In [69]:

# We can view the processes run by this logon by using the same DataFrame
# filtering trick.
# We don't have massive numbers of events but there is a lot of clutter and 
# it's not immediately obvious that anything bad is happening
items = sorted(lx_login[lx_login['addr'] != '?']['ses'].unique().tolist())

def view(x=''):
    procs = (lx_proc_create[lx_proc_create['ses']==x]
                [['TimeGenerated', 'exe','cmdline', 'pid','cwd']])
    display(Markdown(f'{len(procs)} process events'))
    display(procs)

w = widgets.Select(options=items, description='Select Session', **WIDGET_DEFAULTS)
interactive(view, x=w)

Out[69]:

interactive(children=(Select(description='Select Session', layout=Layout(width='95%'), options=(196045,), styl…

Contents

Find Distinctive Process Patterns - Clustering

We can get rid of a lot of the clutter in the process data by clustering. We'll look at this in more detail in the next part but it essentially collapses repetitive events into single items allowing us to focus on distinctive events

In [70]:

# To use the clustering library we're going to cheat
# a little and make the Linux events look a bit more like
# Windows events. This isn't completely necessary but makes
# the code a bit simpler.

lx_to_proc_create = {'acct': 'SubjectUserName',
                     'uid': 'SubjectUserSid',
                     'user': 'SubjectUserName',
                     'ses': 'SubjectLogonId',
                     'pid': 'NewProcessId',
                     'exe': 'NewProcessName',
                     'ppid': 'ProcessId',
                     'cmdline': 'CommandLine',}

proc_create_to_lx = {'SubjectUserName': 'acct',
                     'SubjectUserSid': 'uid',
                     'SubjectUserName': 'user',
                     'SubjectLogonId': 'ses',
                     'NewProcessId': 'pid',
                     'NewProcessName': 'exe',
                     'ProcessId': 'ppid',
                     'CommandLine': 'cmdline',}

lx_to_logon = {'acct': 'SubjectUserName',
               'auid': 'SubjectUserSid',
               'user': 'TargetUserName',
               'uid': 'TargetUserSid',
               'ses': 'TargetLogonId',
               'exe': 'LogonProcessName',
               'terminal': 'LogonType',
               'msg': 'AuthenticationPackageName',
               'res': 'Status',
               'addr': 'IpAddress',
               'hostname': 'WorkstationName',}

logon_to_lx = {'SubjectUserName': 'acct',
               'SubjectUserSid': 'auid',
               'TargetUserName': 'user',
               'TargetUserSid': 'uid',
               'TargetLogonId': 'ses',
               'LogonProcessName': 'exe',
               'LogonType': 'terminal',
               'AuthenticationPackageName': 'msg',
               'Status': 'res',
               'IpAddress': 'addr',
               'WorkstationName': 'hostname',}

lx_proc_create_trans = lx_proc_create.rename(columns=lx_to_proc_create)
lx_login_trans = lx_login.rename(columns=lx_to_logon)

In [71]:

# For demo purposes we're actually running the clustering 
# algorithm against all 13k or so process exec events
# and we can see that it's reduced the unique items to 1%
# of the original volume
print('analyzing data...')
from msticpy.sectools.eventcluster import dbcluster_events, add_process_features

feature_procs_h1 = add_process_features(input_frame=lx_proc_create_trans,
                                        path_separator=security_alert.path_separator)


# you might need to play around with the max_cluster_distance parameter.
# decreasing this gives more clusters.
(clus_events, dbcluster, x_data) = dbcluster_events(data=feature_procs_h1,
                                                    cluster_columns=['commandlineTokensFull', 
                                                                     'pathScore',
                                                                    'SubjectUserSid'],
                                                    time_column='TimeGenerated',
                                                    max_cluster_distance=0.0001)
print('Number of input events:', len(feature_procs_h1))
print('Number of clustered events:', len(clus_events))
(clus_events.sort_values('TimeGenerated')[['TimeGenerated', 'LastEventTime',
                                           'NewProcessName', 'CommandLine', 
                                           'ClusterSize', 'commandlineTokensFull',
                                           'SubjectLogonId', 'SubjectUserSid',
                                           'pathScore', 'isSystemSession']]
    .sort_values('ClusterSize', ascending=True));

Out[71]:

analyzing data...
Number of input events: 13046
Number of clustered events: 138

In [72]:

# Lets try viewing our session again
# For interactive sessions the compression won't be as good
# but we've reduced it to about 20% of the original

def view(x=''):
    procs = (clus_events[clus_events['SubjectLogonId']==x]
            [['TimeGenerated', 'NewProcessName','CommandLine', 
              'NewProcessId', 'SubjectUserSid', 'cwd', 'ClusterSize', 'SubjectLogonId']])
    display(Markdown(f'{len(procs)} process events'))
    display(procs)

w = widgets.Select(options=items, description='Select Session to view', **WIDGET_DEFAULTS)
interactive(view, x=w)

Out[72]:

interactive(children=(Select(description='Select Session to view', layout=Layout(width='95%'), options=(196045…

Badness Uncovered!

On a single screen we can now scan down the whole session and see pretty quickly some very suspicious activity:

Reconnaisance - getting machine info, contents of /etc/passwd and mail
Downloading a script and making it executable
The crontab command is not entirely clear (likely the start of a pipeline) but it seems a good bet that the script is being installed as a cron job

In [73]:

# Let's save our first piece of real evidence in our summary collection
selected_session = w.value
add_observation(Observation(caption='Suspicious Process Session on Linux Host.',
                            description='Attempt to download and run script + recon cmds.',
                            item = clus_events.query('SubjectLogonId == @selected_session & ClusterSize < 3'),
                            link='linux_proc_cluster'))

Contents

Part 2b - Host Network Data

Get the IP Address of the Source Host

In [74]:

                   

host_entities = [e for e in security_alert.entities if isinstance(e, nbtools.Host)]
if len(host_entities) == 1:
    alert_host_entity = host_entities[0]
    host_name = alert_host_entity.HostName
    resource = alert_host_entity.AzureID
else:
    host_name = None
    alert_host_entity = None
    print('Error: Could not determine host entity from alert. Please type the hostname below')
txt_wgt = widgets.Text(value=host_name, description='Confirm Source Host name:', **WIDGET_DEFAULTS)
display(txt_wgt)

Out[74]:

Text(value='MSTICALERTSLXVM2', description='Confirm Source Host name:', layout=Layout(width='95%'), style=Desc…

In [75]:

             

print('Looking for IP addresses of ', txt_wgt.value)
aznet_query = '''
AzureNetworkAnalytics_CL 
| where VirtualMachine_s has \'{host}\'
| where ResourceType == 'NetworkInterface'
| top 1 by TimeGenerated desc
| project PrivateIPAddresses = PrivateIPAddresses_s, 
    PublicIPAddresses = PublicIPAddresses_s
'''.format(host=txt_wgt.value)
%kql -query aznet_query
az_net_df = _kql_raw_result_.to_dataframe()


oms_heartbeat_query = '''
Heartbeat
| where Computer has \'{host}\'
| top 1 by TimeGenerated desc nulls last
| project ComputerIP, OSType, OSMajorVersion, OSMinorVersion, ResourceId, RemoteIPCountry, 
RemoteIPLatitude, RemoteIPLongitude, SourceComputerId
'''.format(host=txt_wgt.value)
%kql -query oms_heartbeat_query
oms_heartbeat_df = _kql_raw_result_.to_dataframe()
display(oms_heartbeat_df[['ComputerIP']])
display(az_net_df)

print('getting data...')
# Get the host entity and add this IP and system info to the 
try:
    if not inv_host_entity:
        inv_host_entity = entity.Host()
        inv_host_entity.HostName = host_name
except NameError:
    inv_host_entity = entity.Host()
    inv_host_entity.HostName = host_name

def convert_to_ip_entities(ip_str):
    ip_entities = []
    if ip_str:
        if ',' in ip_str:
            addrs = ip_str.split(',')
        elif ' ' in ip_str:
            addrs = ip_str.split(' ')
        else:
            addrs = [ip_str]
        for addr in addrs:
            ip_entity = entity.IpAddress()
            ip_entity.Address = addr.strip()
            iplocation.lookup_ip(ip_entity=ip_entity)
            ip_entities.append(ip_entity)
    return ip_entities

# Add this information to our inv_host_entity
retrieved_address=[]
if len(az_net_df) == 1:
    priv_addr_str = az_net_df['PrivateIPAddresses'].loc[0]
    inv_host_entity.properties['private_ips'] = convert_to_ip_entities(priv_addr_str)

    pub_addr_str = az_net_df['PublicIPAddresses'].loc[0]
    inv_host_entity.properties['public_ips'] = convert_to_ip_entities(pub_addr_str)
    retrieved_address = [ip.Address for ip in inv_host_entity.properties['public_ips']]
else:
    if 'private_ips' not in inv_host_entity.properties:
        inv_host_entity.properties['private_ips'] = []
    if 'public_ips' not in inv_host_entity.properties:
        inv_host_entity.properties['public_ips'] = []
        
if len(oms_heartbeat_df) == 1:
    if oms_heartbeat_df['ComputerIP'].loc[0]:
        oms_address = oms_heartbeat_df['ComputerIP'].loc[0]
        if oms_address not in retrieved_address:
            ip_entity = entity.IpAddress()
            ip_entity.Address = oms_address
            iplocation.lookup_ip(ip_entity=ip_entity)
            inv_host_entity.properties['public_ips'].append(ip_entity)
        
    inv_host_entity.OSFamily = oms_heartbeat_df['OSType'].loc[0]
    inv_host_entity.AdditionalData['OSMajorVersion'] = oms_heartbeat_df['OSMajorVersion'].loc[0]
    inv_host_entity.AdditionalData['OSMinorVersion'] = oms_heartbeat_df['OSMinorVersion'].loc[0]
    inv_host_entity.AdditionalData['SourceComputerId'] = oms_heartbeat_df['SourceComputerId'].loc[0]

print('Updated Host Entity\n')
print(inv_host_entity)

Out[75]:

Looking for IP addresses of  MSTICALERTSLXVM2

getting data...
Updated Host Entity

{ 'AdditionalData': { 'OSMajorVersion': '18',
                      'OSMinorVersion': '04',
                      'SourceComputerId': '44623fb0-bd5f-49ea-84d1-56aa11ab8a25'},
  'HostName': 'MSTICALERTSLXVM2',
  'OSFamily': 'Linux',
  'Type': 'host',
  'private_ips': [{"Address": "10.0.3.4", "Type": "ipaddress"}],
  'public_ips': [ {"Address": "104.211.30.1", "Location": {"CountryCode": "US", "CountryName": "United States", "State": "Virginia", "City": "Washington", "Longitude": -78.1704, "Latitude": 38.7163, "Type": "geolocation"}, "Type": "ipaddress"}]}

Contents

Check Communications with Other Hosts

In [76]:

                   

# Azure Network Analytics Base Query
az_net_analytics_query =r'''
AzureNetworkAnalytics_CL 
| where SubType_s == 'FlowLog'
| where FlowStartTime_t >= datetime({start})
| where FlowEndTime_t <= datetime({end})
| project TenantId, TimeGenerated, 
    FlowStartTime = FlowStartTime_t, 
    FlowEndTime = FlowEndTime_t, 
    FlowIntervalEndTime = FlowIntervalEndTime_t, 
    FlowType = FlowType_s,
    ResourceGroup = split(VM_s, '/')[0],
    VMName = split(VM_s, '/')[1],
    VMIPAddress = VMIP_s, 
    PublicIPs = extractall(@"([\d\.]+)[|\d]+", dynamic([1]), PublicIPs_s),
    SrcIP = SrcIP_s,
    DestIP = DestIP_s,
    ExtIP = iif(FlowDirection_s == 'I', SrcIP_s, DestIP_s),
    L4Protocol = L4Protocol_s, 
    L7Protocol = L7Protocol_s, 
    DestPort = DestPort_d, 
    FlowDirection = FlowDirection_s,
    AllowedOutFlows = AllowedOutFlows_d, 
    AllowedInFlows = AllowedInFlows_d,
    DeniedInFlows = DeniedInFlows_d, 
    DeniedOutFlows = DeniedOutFlows_d,
    RemoteRegion = AzureRegion_s,
    VMRegion = Region_s
| extend AllExtIPs = iif(isempty(PublicIPs), pack_array(ExtIP), 
                         iif(isempty(ExtIP), PublicIPs, array_concat(PublicIPs, pack_array(ExtIP)))
                         )
| project-away ExtIP
| mvexpand AllExtIPs
{where_clause}
'''

ip_q_times = nbtools.QueryTime(label='Set time bounds for network queries',
                           units='hour', max_before=48, before=10, after=5, 
                           max_after=24, origin_time=security_alert.StartTimeUtc)
ip_q_times.display()

Out[76]:

HTML(value='<h4>Set time bounds for network queries</h4>')

HBox(children=(DatePicker(value=datetime.date(2019, 2, 18), description='Origin Date'), Text(value='15:29:22',…

VBox(children=(IntRangeSlider(value=(-10, 5), description='Time Range (hour):', layout=Layout(width='80%'), ma…

Query Flows by Host IP Addresses

In [77]:

                   

all_alert_host_ips = inv_host_entity.private_ips + inv_host_entity.public_ips
host_ips = {'\'{}\''.format(i.Address) for i in all_alert_host_ips}
alert_host_ip_list = ','.join(host_ips)

az_ip_where = f'''
| where (VMIPAddress in ({alert_host_ip_list}) 
        or SrcIP in ({alert_host_ip_list}) 
        or DestIP in ({alert_host_ip_list}) 
        ) and 
    (AllowedOutFlows > 0 or AllowedInFlows > 0)'''
print('getting data...')
az_net_query_byip = az_net_analytics_query.format(where_clause=az_ip_where,
                                                  start = ip_q_times.start,
                                                  end = ip_q_times.end)

net_default_cols = ['FlowStartTime', 'FlowEndTime', 'VMName', 'VMIPAddress', 
                'PublicIPs', 'SrcIP', 'DestIP', 'L4Protocol', 'L7Protocol',
                'DestPort', 'FlowDirection', 'AllowedOutFlows', 
                'AllowedInFlows']

%kql -query az_net_query_byip
az_net_comms_df = _kql_raw_result_.to_dataframe()
az_net_comms_df[net_default_cols]

Out[77]:

getting data...

Flow Time and Protocol Distribution

In [78]:

import warnings

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    
    az_net_comms_df['TotalAllowedFlows'] = az_net_comms_df['AllowedOutFlows'] + az_net_comms_df['AllowedInFlows']
    sns.catplot(x="L7Protocol", y="TotalAllowedFlows", col="FlowDirection", data=az_net_comms_df)
    sns.relplot(x="FlowStartTime", y="TotalAllowedFlows", 
                col="FlowDirection", kind="line", 
                hue="L7Protocol", data=az_net_comms_df).set_xticklabels(rotation=50)

Out[78]:

Isolated SSH traffic

In [79]:

az_net_comms_df.query('FlowDirection == \'I\' & L7Protocol == \'ssh\'')[net_default_cols]

Out[79]:

Seems suspicious, so Record findings

In [80]:



ext_ip_list = az_net_comms_df.query('FlowDirection == \'I\' & L7Protocol == \'ssh\'')['AllExtIPs'].tolist()

for ip in ext_ip_list:
    if not ip:
        continue
    # Check IP is not already in our list of entities
    if ip in [curr_ip.Address for curr_ip in alert_ip_entities]:
        continue
    ip_entity = entity.IpAddress(Address=ip)
    iplocation.lookup_ip(ip_entity=ip_entity)
    
    alert_ip_entities.append(ip_entity)
    
add_observation(Observation(caption='Outlier SSH session on Linux Host.',
                            description='''Plot of in/out flows shows unexpected ssh inbound. 
Ip Address confirmed as logon source for SSH.''',
                            item = az_net_comms_df.query('FlowDirection == \'I\' & L7Protocol == \'ssh\''),
                            link='net_flow_graphs'))

Interactive Flow Timeline

In [81]:


nbdisp.display_timeline(data=az_net_comms_df.query('AllowedOutFlows > 0'),
                         overlay_data=az_net_comms_df.query('AllowedInFlows > 0'),
                         alert=security_alert, title='Network Flows (out=blue, in=green)',
                         time_column='FlowStartTime',
                         source_columns=['FlowType', 'AllExtIPs', 'L7Protocol', 'FlowDirection'],
                         height=300)

Out[81]:

MIME type unknown not supported

Alert start time =  2019-02-18 15:29:22

C:\Users\ianhelle\AppData\Local\Continuum\anaconda3\envs\condadev\lib\site-packages\bokeh\core\property\container.py:102: DeprecationWarning:

Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working

MIME type unknown not supported

Contents

GeoLocation Mapping

In [86]:



ip_locs_in = set()
ip_locs_out = set()
for _, row in az_net_comms_df.iterrows():
    ip = row.AllExtIPs

    if ip in ip_locs_in or ip in ip_locs_out or not ip:
        continue
    ip_entity = entity.IpAddress(Address=ip)
    iplocation.lookup_ip(ip_entity=ip_entity)
    if not ip_entity.Location:
        continue
    ip_entity.AdditionalData['protocol'] = row.L7Protocol
    if row.FlowDirection == 'I':
        ip_locs_in.add(ip_entity)
    else:
        ip_locs_out.add(ip_entity)

flow_map = FoliumMap()
display(HTML('<h3>External IP Addresses communicating with host</h3>'))
display(HTML('Numbered circles indicate multiple items - click to expand'))
display(HTML('Location markers: Blue = outbound, Purple = inbound, Green = Host'))

flow_map.add_ip_cluster(ip_entities=inv_host_entity.public_ips,
                        color='green')
flow_map.add_ip_cluster(ip_entities=ip_locs_out,
                        color='blue')
flow_map.add_ip_cluster(ip_entities=ip_locs_in,
                        color='red')

display(flow_map.folium_map)
display(Markdown('<p style="color:red">Warning: the folium mapping library '
                 'does not display correctly in some browsers.</p><br>'
                 'If you see a blank image please retry with a different browser.'))

Out[86]:

Warning: the folium mapping library does not display correctly in some browsers.

If you see a blank image please retry with a different browser.

Look at 'Denied' Flows - who's trying to get in from where?

Optional and can take a long time

In [89]:

# Comment this out to run automatically
if True:
    az_ip_where = f'''
    | where (VMIPAddress in ({alert_host_ip_list}) 
            or SrcIP in ({alert_host_ip_list}) 
            or DestIP in ({alert_host_ip_list}) 
            )'''

    az_net_query_byip = az_net_analytics_query.format(where_clause=az_ip_where,
                                                      start = ip_q_times.start,
                                                      end = ip_q_times.end)
    %kql -query az_net_query_byip
    az_net_comms_all_df = _kql_raw_result_.to_dataframe()

    ip_all = set()
    ip_locs_in_allow = set()
    ip_locs_out_allow = set()
    ip_locs_in_deny = set()
    ip_locs_out_deny = set()
    for _, row in az_net_comms_all_df.iterrows():
        if not row.PublicIPs:
            continue
        for ip in row.PublicIPs:
            if ip in ip_all:
                continue
            ip_all.add(ip)
            ip_entity = entity.IpAddress(Address=ip)
            iplocation.lookup_ip(ip_entity=ip_entity)
            if not ip_entity.Location:
                print("No location information for IP: ", ip)
                continue
            ip_entity.AdditionalData['protocol'] = row.L7Protocol
            if row.FlowDirection == 'I':
                if row.AllowedInFlows > 0:
                    ip_locs_in_allow.add(ip_entity)
                elif row.DeniedInFlows > 0:
                    ip_locs_in_deny.add(ip_entity)
            else:
                if row.AllowedOutFlows > 0:
                    ip_locs_out_allow.add(ip_entity)
                elif row.DeniedOutFlows > 0:
                    ip_locs_out_deny.add(ip_entity)

    flow_map = FoliumMap()
    display(HTML('<h3>External IP Addresses Blocked and Allowed communicating with host</h3>'))
    display(HTML('Numbered circles indicate multiple items - click to expand.'))
    display(HTML('Location markers: Blue = outbound, Purple = inbound, Red = in denied, Cyan = out denied.'))

    flow_map.add_ip_cluster(ip_entities=ip_locs_in_allow,
                            color='purple')
    flow_map.add_ip_cluster(ip_entities=ip_locs_out_allow,
                            color='blue')
    flow_map.add_ip_cluster(ip_entities=ip_locs_in_deny,
                            color='red')
    flow_map.add_ip_cluster(ip_entities=ip_locs_out_deny,
                            color='cyan')
    display(flow_map.folium_map)
    
    display(Markdown('<p style="color:red">Warning: the folium mapping library '
                     'does not display correctly in some browsers.</p><br>'
                     'If you see a blank image please retry with a different browser.'))

Out[89]:

No location information for IP:  193.32.161.50
No location information for IP:  88.214.26.103
No location information for IP:  81.22.45.116
No location information for IP:  88.214.26.77
No location information for IP:  81.22.45.102
No location information for IP:  88.214.26.38
No location information for IP:  141.98.80.150
No location information for IP:  193.32.160.69
No location information for IP:  194.61.24.198
No location information for IP:  81.22.45.81
No location information for IP:  81.22.45.106

e:\src\microsoft\msticpy\msticpy\msticpy\nbtools\foliummap.py:73: RuntimeWarning:

Invalid location information for IP: 194.113.106.162

e:\src\microsoft\msticpy\msticpy\msticpy\nbtools\foliummap.py:73: RuntimeWarning:

Invalid location information for IP: 194.147.32.125

Warning: the folium mapping library does not display correctly in some browsers.

If you see a blank image please retry with a different browser.

DNS Activity Includes any of these IPs?

In [90]:


dns_query =r'''
DnsEvents
| where ClientIP in ({ip_list})
'''.format(ip_list=', '.join([f'\'{ip.Address}\'' for ip in alert_ip_entities]))

%kql -query dns_query
dns_df = _kql_raw_result_.to_dataframe()
dns_df

Out[90]:

Contents

Have any other hosts been communicating with this address(es)?

In [1]:

ip_q_times = nbtools.QueryTime(units='day', max_before=10, before=3, after=1, max_after=10, origin_time=security_alert.StartTimeUtc)
ip_q_times.display()

Out[1]:

---------------------------------------------------
NameError         Traceback (most recent call last)
<ipython-input-1-f6d8f83b1c37> in <module>
----> 1 ip_q_times = nbtools.QueryTime(units='day', max_before=10, before=3, after=1, max_after=10, origin_time=security_alert.StartTimeUtc)
      2 ip_q_times.display()
NameError: name 'nbtools' is not defined

In [92]:


alert_ips = {'\'{}\''.format(i.Address) for i in alert_ip_entities}
alert_host_ip_list = ','.join(alert_ips)

az_ip_where = f'| where AllExtIPs in ({alert_host_ip_list})'

az_net_query_by_pub_ip = az_net_analytics_query.format(where_clause=az_ip_where,
                                                       start = ip_q_times.start,
                                                       end = ip_q_times.end)
print('getting data...')
%kql -query az_net_query_by_pub_ip
az_net_ext_comms_df = _kql_raw_result_.to_dataframe()
az_net_ext_comms_df[net_default_cols]

# az_net_ext_comms_df.groupby(['VMName', 'L7Protocol'])['AllowedOutFlows','AllowedInFlows','DeniedInFlows','DeniedOutFlows'].sum()

Out[92]:

getting data...

In [93]:



inv_host_ips = [ent.Address for ent in inv_host_entity.private_ips]
inv_host_ips += [ent.Address for ent in inv_host_entity.public_ips]

alert_ips = [ip.Address for ip in alert_ip_entities]

known_ips = inv_host_ips + alert_ips

# Ips can be in one of 4 columns!
def find_new_ips(known_ips, row):
    new_ips = set()
    if row.VMIPAddress and row.VMIPAddress not in known_ips:
        new_ips.add(row.VMIPAddress)
    if row.SrcIP and row.SrcIP not in known_ips:
        new_ips.add(row.SrcIP)
    if row.DestIP and row.DestIP not in known_ips:
        new_ips.add(row.DestIP)
    if row.PublicIPs:
        for pub_ip in row.PublicIPs:
            if pub_ip not in known_ips:
                new_ips.add(pub_ip)
    if new_ips:            
        return list(new_ips)

new_ips_all = az_net_ext_comms_df.apply(lambda x: find_new_ips(known_ips, x), axis=1).dropna()
new_ips = set()
for ip in [ip for item in new_ips_all for ip in item]:
    new_ips.add(ip)
display(Markdown(f'#### {len(new_ips)} unseen IP Address found in this data: {list(new_ips)}'))

Out[93]:

1 unseen IP Address found in this data: ['10.0.3.5']

Note you should re-run this section for each new IP Address found to determine who it belongs to

In [94]:

items = list(new_ips)
ip_w = widgets.Select(options=items, 
                   description='Select ip address to search for',
                   value=items[0] if items else None,
                   **WIDGET_DEFAULTS)
display(ip_w)

Out[94]:

Select(description='Select ip address to search for', layout=Layout(width='95%'), options=('10.0.3.5',), style…

In [95]:



vm_ip = ip_w.value
aznet_query = '''
AzureNetworkAnalytics_CL 
| where PrivateIPAddresses_s has \'{vm_ip}\'
| where ResourceType == 'NetworkInterface'
| top 1 by TimeGenerated desc
| project PrivateIPAddresses = PrivateIPAddresses_s, 
    PublicIPAddresses = PublicIPAddresses_s,
    VirtualMachine = VirtualMachine_s
| extend Host = split(VirtualMachine, '/')[-1]
'''.format(vm_ip=vm_ip)
%kql -query aznet_query
az_net_df = _kql_raw_result_.to_dataframe()
if len(az_net_df) > 0:
    host_name = az_net_df['Host'].at[0]

oms_heartbeat_query = '''
Heartbeat
| where ComputerIP == \'{vm_ip}\'
| top 1 by TimeGenerated desc nulls last
| project Computer, ComputerIP, OSType, OSMajorVersion, OSMinorVersion, ResourceId, RemoteIPCountry, 
RemoteIPLatitude, RemoteIPLongitude, SourceComputerId
'''.format(vm_ip=vm_ip)
%kql -query oms_heartbeat_query
oms_heartbeat_df = _kql_raw_result_.to_dataframe()
if len(oms_heartbeat_df) > 0:
    host_name = oms_heartbeat_df['Computer'].at[0]
    

# Get the host entity and add this IP and system info to the 
try:
    if not victim_host_entity:
        victim_host_entity = entity.Host()
        victim_host_entity.HostName = host_name
except NameError:
    victim_host_entity = entity.Host()
    victim_host_entity.HostName = host_name

def convert_to_ip_entities(ip_str):
    ip_entities = []
    if ip_str:
        if ',' in ip_str:
            addrs = ip_str.split(',')
        elif ' ' in ip_str:
            addrs = ip_str.split(' ')
        else:
            addrs = [ip_str]
        for addr in addrs:
            ip_entity = entity.IpAddress()
            ip_entity.Address = addr.strip()
            iplocation.lookup_ip(ip_entity=ip_entity)
            ip_entities.append(ip_entity)
    return ip_entities

# Add this information to our inv_host_entity
retrieved_pub_addresses = []
if len(az_net_df) == 1:
    priv_addr_str = az_net_df['PrivateIPAddresses'].loc[0]
    victim_host_entity.properties['private_ips'] = convert_to_ip_entities(priv_addr_str)

    pub_addr_str = az_net_df['PublicIPAddresses'].loc[0]
    victim_host_entity.properties['public_ips'] = convert_to_ip_entities(pub_addr_str)
    retrieved_pub_addresses = [ip.Address for ip in victim_host_entity.properties['public_ips']]
    
if len(oms_heartbeat_df) == 1:
    if oms_heartbeat_df['ComputerIP'].loc[0]:
        oms_address = oms_heartbeat_df['ComputerIP'].loc[0]
        if oms_address not in retrieved_address:
            ip_entity = entity.IpAddress()
            ip_entity.Address = oms_address
            iplocation.lookup_ip(ip_entity=ip_entity)
            inv_host_entity.properties['public_ips'].append(ip_entity)
        
    victim_host_entity.OSFamily = oms_heartbeat_df['OSType'].loc[0]
    victim_host_entity.AdditionalData['OSMajorVersion'] = oms_heartbeat_df['OSMajorVersion'].loc[0]
    victim_host_entity.AdditionalData['OSMinorVersion'] = oms_heartbeat_df['OSMinorVersion'].loc[0]
    victim_host_entity.AdditionalData['SourceComputerId'] = oms_heartbeat_df['SourceComputerId'].loc[0]

print(f'Found New Host Entity {victim_host_entity.HostName}\n')
print(victim_host_entity)

add_observation(Observation(caption=f'Second victim host identified {victim_host_entity.HostName}',
                            description='Description of host entity shown in attachment.',
                            item=victim_host_entity,
                            link='other_hosts_to_ips'))

Out[95]:

Found New Host Entity msticalertswin1

{ 'HostName': 'msticalertswin1',
  'Type': 'host',
  'private_ips': [{"Address": "10.0.3.5", "Type": "ipaddress"}],
  'public_ips': [ {"Address": "40.76.43.124", "Location": {"CountryCode": "US", "CountryName": "United States", "State": "Virginia", "City": "Washington", "Longitude": -78.1704, "Latitude": 38.7163, "Type": "geolocation"}, "Type": "ipaddress"}]}

In [97]:

alert_ip_entities

Out[97]:

[{"Address": "23.97.60.214", "Type": "ipaddress"}]

In [98]:



sns.set()
from matplotlib import MatplotlibDeprecationWarning
warnings.simplefilter("ignore", category=MatplotlibDeprecationWarning)

ip_graph = nx.DiGraph(id='IPGraph')

def add_vm_node(graph, host_entity):
    vm_name = host_entity.HostName
    vm_ip = host_entity.private_ips[0].Address
    vm_desc = f'{host_entity.HostName}\n{row.ResourceGroup}, {row.VMRegion}'
    ip_graph.add_node(vm_ip, name=vm_name, description=vm_desc,
                      node_type='host')

for ip_entity in alert_ip_entities:
    if 'Location' not in ip_entity:
        iplocation.lookup_ip(ip_entity=ip_entity)
    if 'Location' in ip_entity:
        ip_desc = f'{ip_entity.Address}\n{ip_entity.Location.City}, {ip_entity.Location.CountryName}'
    else:
        ip_desc = 'unknown location'
    ip_graph.add_node(ip_entity.Address, name=ip_entity.Address, description=ip_desc, node_type='ip')

add_vm_node(ip_graph, inv_host_entity)
add_vm_node(ip_graph, victim_host_entity)


def add_edges(graph, row): 
    dest_ip = row.DestIP if row.DestIP else row.VMIPAddress
    if row.SrcIP:
        src_ip = row.SrcIP
        ip_graph.add_edge(src_ip, dest_ip)
    else:
        for ip in row.PublicIPs:
            src_ip = ip
            ip_graph.add_edge(src_ip, dest_ip)

    
# Add edges from network data
az_net_ext_comms_df.apply(lambda x: add_edges(ip_graph, x),axis=1)

src_node = [n for (n, node_type) in
            nx.get_node_attributes(ip_graph, 'node_type').items()
            if node_type == 'ip']
vm_nodes = [n for (n, node_type) in
            nx.get_node_attributes(ip_graph, 'node_type').items()
            if node_type == 'host']

# now draw them in subsets  using the `nodelist` arg
plt.rcParams['figure.figsize'] = (10, 10)
plt.margins(x=0.3, y=0.3)
plt.title('Comms between hosts and suspect IPs')
pos = nx.circular_layout(ip_graph)
nx.draw_networkx_nodes(ip_graph, pos, nodelist=src_node,
                       node_color='red', alpha=0.5, node_shape='o')
nx.draw_networkx_nodes(ip_graph, pos, nodelist=vm_nodes,
                       node_color='green', alpha=0.5, node_shape='s',
                       s=400)
nlabels = nx.get_node_attributes(ip_graph, 'description')
nx.relabel_nodes(ip_graph, nlabels)
nx.draw_networkx_labels(ip_graph, pos, nlabels, font_size=15)
nx.draw_networkx_edges(ip_graph, pos, alpha=0.5, arrows=True, arrowsize=20);

Out[98]:

Contents

Part 3 - Windows Host and Office 365

Other Hosts Communicating with IP

Contents

Check Host Logons

In [99]:



from msticpy.nbtools.query_defns import DataFamily, DataEnvironment
params_dict = {}
params_dict.update(security_alert.query_params)
params_dict['host_filter_eq'] = f'Computer has \'{victim_host_entity.HostName}\''
params_dict['host_filter_neq'] = f'Computer !has \'{victim_host_entity.HostName}\''
params_dict['host_name'] = victim_host_entity.HostName
params_dict['subscription_filter'] = 'true'
if victim_host_entity.OSFamily == 'Linux':
    params_dict['data_family'] = DataFamily.LinuxSecurity
    params_dict['path_separator'] = '/'
else:
    params_dict['data_family'] = DataFamily.WindowsSecurity
    params_dict['path_separator'] = '\\'

# set the origin time to the time of our alert
logon_query_times = nbtools.QueryTime(units='day', origin_time=security_alert.origin_time,
                                  before=5, after=1, max_before=20, max_after=20)
logon_query_times.display()

Out[99]:

HTML(value='<h4>Set query time boundaries</h4>')

HBox(children=(DatePicker(value=datetime.date(2019, 2, 18), description='Origin Date'), Text(value='15:29:28',…

VBox(children=(IntRangeSlider(value=(-5, 1), description='Time Range (day):', layout=Layout(width='80%'), max=…

In [100]:



from msticpy.sectools.eventcluster import dbcluster_events, add_process_features, _string_score

host_logons = qry.list_host_logons(provs=[logon_query_times], **params_dict)


if len(host_logons) > 0:
    logon_features = host_logons.copy()
    logon_features['AccountNum'] = host_logons.apply(lambda x: _string_score(x.Account), axis=1)
    logon_features['LogonIdNum'] = host_logons.apply(lambda x: _string_score(x.TargetLogonId), axis=1)
    logon_features['LogonHour'] = host_logons.apply(lambda x: x.TimeGenerated.hour, axis=1)

    # you might need to play around with the max_cluster_distance parameter.
    # decreasing this gives more clusters.
    (clus_logons, _, _) = dbcluster_events(data=logon_features, time_column='TimeGenerated',
                                           cluster_columns=['AccountNum',
                                                            'LogonType'],
                                           max_cluster_distance=0.0001)
    %matplotlib inline
    plt.rcParams['figure.figsize'] = (12, 4)
    clus_logons.plot.barh(x="Account", y="ClusterSize")

    display(Markdown(f'Number of input events: {len(host_logons)}'))
    display(Markdown(f'Number of clustered events: {len(clus_logons)}'))
    display(Markdown('#### Distinct host logon patterns'))
    clus_logons.sort_values('TimeGenerated')
    nbdisp.display_logon_data(clus_logons)
else:
    display(Markdown('No logon events found for host.'))

Out[100]:

Number of input events: 201

Number of clustered events: 10

Distinct host logon patterns

### Account Logon
Account:  MSTICAdmin
Account Domain:  MSTICAlertsWin1
Logon Time:  2019-02-15 03:57:02.593000
Logon type: 10  (RemoteInteractive)
User Id/SID:  S-1-5-21-996632719-2361334927-4038480536-500
    SID S-1-5-21-996632719-2361334927-4038480536-500 is administrator
    SID S-1-5-21-996632719-2361334927-4038480536-500 is local machine or domain account
Session id '0x109c408'  
Subject (source) account:  WORKGROUP/MSTICAlertsWin1$
Logon process:  User32 
Authentication:  Negotiate
Source IpAddress:  131.107.147.209
Source Host:  MSTICAlertsWin1
Logon status:  

### Account Logon
Account:  SYSTEM
Account Domain:  NT AUTHORITY
Logon Time:  2019-02-14 04:20:54.370000
Logon type: 0  (Unknown)
User Id/SID:  S-1-5-18
    SID S-1-5-18 is LOCAL_SYSTEM
Session id '0x3e7'  System logon session

Subject (source) account:  -/-
Logon process:  -
Authentication:  -
Source IpAddress:  -
Source Host:  -
Logon status:  

### Account Logon
Account:  LOCAL SERVICE
Account Domain:  NT AUTHORITY
Logon Time:  2019-02-14 04:20:54.803000
Logon type: 5  (Service)
User Id/SID:  S-1-5-19
    SID S-1-5-19 is LOCAL_SERVICE
Session id '0x3e5'  
Subject (source) account:  WORKGROUP/MSTICAlertsWin1$
Logon process:  Advapi  
Authentication:  Negotiate
Source IpAddress:  -
Source Host:  -
Logon status:  

### Account Logon
Account:  IUSR
Account Domain:  NT AUTHORITY
Logon Time:  2019-02-14 04:20:56.110000
Logon type: 5  (Service)
User Id/SID:  S-1-5-17
Session id '0x3e3'  
Subject (source) account:  WORKGROUP/MSTICAlertsWin1$
Logon process:  Advapi  
Authentication:  Negotiate
Source IpAddress:  -
Source Host:  -
Logon status:  

### Account Logon
Account:  SYSTEM
Account Domain:  NT AUTHORITY
Logon Time:  2019-02-18 08:24:22.400000
Logon type: 5  (Service)
User Id/SID:  S-1-5-18
    SID S-1-5-18 is LOCAL_SYSTEM
Session id '0x3e7'  System logon session

Subject (source) account:  WORKGROUP/MSTICAlertsWin1$
Logon process:  Advapi  
Authentication:  Negotiate
Source IpAddress:  -
Source Host:  -
Logon status:  

### Account Logon
Account:  ian
Account Domain:  MSTICAlertsWin1
Logon Time:  2019-02-18 13:46:03.590000
Logon type: 4  (Batch)
User Id/SID:  S-1-5-21-996632719-2361334927-4038480536-1120
    SID S-1-5-21-996632719-2361334927-4038480536-1120 is local machine or domain account
Session id '0x52884d4'  
Subject (source) account:  WORKGROUP/MSTICAlertsWin1$
Logon process:  Advapi  
Authentication:  Negotiate
Source IpAddress:  -
Source Host:  MSTICAlertsWin1
Logon status:  

### Account Logon
Account:  ian
Account Domain:  MSTICAlertsWin1
Logon Time:  2019-02-16 03:24:41.980000
Logon type: 3  (Network)
User Id/SID:  S-1-5-21-996632719-2361334927-4038480536-1120
    SID S-1-5-21-996632719-2361334927-4038480536-1120 is local machine or domain account
Session id '0x25d84ea'  
Subject (source) account:  -/-
Logon process:  NtLmSsp 
Authentication:  NTLM
Source IpAddress:  23.97.60.214
Source Host:  MSTICRemoteWin1
Logon status:  

### Account Logon
Account:  DWM-4
Account Domain:  Window Manager
Logon Time:  2019-02-16 03:24:49.093000
Logon type: 2  (Interactive)
User Id/SID:  S-1-5-90-0-4
Session id '0x25dc260'  
Subject (source) account:  WORKGROUP/MSTICAlertsWin1$
Logon process:  Advapi  
Authentication:  Negotiate
Source IpAddress:  -
Source Host:  -
Logon status:  

### Account Logon
Account:  ian
Account Domain:  MSTICAlertsWin1
Logon Time:  2019-02-16 03:24:49.500000
Logon type: 10  (RemoteInteractive)
User Id/SID:  S-1-5-21-996632719-2361334927-4038480536-1120
    SID S-1-5-21-996632719-2361334927-4038480536-1120 is local machine or domain account
Session id '0x25df86b'  
Subject (source) account:  WORKGROUP/MSTICAlertsWin1$
Logon process:  User32 
Authentication:  Negotiate
Source IpAddress:  23.97.60.214
Source Host:  MSTICAlertsWin1
Logon status:  

### Account Logon
Account:  MSTICAdmin
Account Domain:  MSTICAlertsWin1
Logon Time:  2019-02-15 03:56:57.070000
Logon type: 3  (Network)
User Id/SID:  S-1-5-21-996632719-2361334927-4038480536-500
    SID S-1-5-21-996632719-2361334927-4038480536-500 is administrator
    SID S-1-5-21-996632719-2361334927-4038480536-500 is local machine or domain account
Session id '0x1096a6d'  
Subject (source) account:  -/-
Logon process:  NtLmSsp 
Authentication:  NTLM
Source IpAddress:  131.107.147.209
Source Host:  IANHELLE-DEV17
Logon status:  

Classification of Logon Types by Account

In [101]:



display(Markdown('### Counts of logon events by logon type.'))
display(Markdown('Min counts for each logon type highlighted.'))
logon_by_type = (host_logons[['Account', 'LogonType', 'EventID']]
                .groupby(['Account','LogonType']).count().unstack()
                .fillna(0)
                .style
                .background_gradient(cmap='viridis', low=.5, high=0)
                .format("{0:0>3.0f}"))
display(logon_by_type)
key = 'logon type key = {}'.format('; '.join([f'{k}: {v}' for k,v in nbtools.nbdisplay._WIN_LOGON_TYPE_MAP.items()]))
display(Markdown(key))

display(Markdown('### Logon Timeline.'))
nbdisp.display_timeline(data=host_logons,
                         overlay_data=host_logons.query('LogonType == 10'),
                         alert=security_alert, 
                         source_columns=['Account', 'LogonType', 'TimeGenerated'],
                         title='All Host Logons (RDP Logons in green)')

add_observation(Observation(caption='RDP Logons seen for victim #2',
                            description='Logons by logon type.',
                            item=logon_by_type,
                            link='victim2_logon_types'))

Out[101]:

Counts of logon events by logon type.

Min counts for each logon type highlighted.

logon type key = 0: Unknown; 2: Interactive; 3: Network; 4: Batch; 5: Service; 7: Unlock; 8: NetworkCleartext; 9: NewCredentials; 10: RemoteInteractive; 11: CachedInteractive

Logon Timeline.

MIME type unknown not supported

Alert start time =  2019-02-18 15:29:22

MIME type unknown not supported

Contents

Check for Failed Logons

In [102]:



failedLogons = qry.list_host_logon_failures(provs=[logon_query_times], **params_dict)
if failedLogons.shape[0] == 0:
    display(print('No logon failures recorded for this host between {security_alert.start} and {security_alert.start}'))
else:
    display(failedLogons)
    add_observation(Observation(caption='Logon failures seen for victim #2',
                            description=f'{len(failedLogons)} Logons seen.',
                            item=failedLogons,
                            link='failed_logons'))

Out[102]:

Contents

Examine a Logon Session

Select a Logon ID To Examine

In [103]:



import re
dist_logons = clus_logons.sort_values('TimeGenerated')[['TargetUserName', 'TimeGenerated', 
                                                        'LastEventTime', 'LogonType', 
                                                        'ClusterSize']]
items = dist_logons.apply(lambda x: (f'{x.TargetUserName}:    '
                                     f'(logontype={x.LogonType})   '
                                     f'timerange={x.TimeGenerated} - {x.LastEventTime}    '
                                     f'count={x.ClusterSize}'),
                          axis=1).values.tolist()
def get_selected_logon_cluster(selected_item):
    acct_match = re.search(r'(?P<acct>[^:]+):\s+\(logontype=(?P<l_type>[^)]+)', selected_item)
    if acct_match:
        acct = acct_match['acct']
        l_type = int(acct_match['l_type'])
        return host_logons.query('TargetUserName == @acct and LogonType == @l_type')

def get_selected_logon(selected_item):
    logon_list_regex = r'''
(?P<acct>[^:]+):\s+
\(logontype=(?P<logon_type>[^)]+)\)\s+
\(timestamp=(?P<time>[^)]+)\)\s+
logonid=(?P<logonid>[0-9a-fx)]+)
'''
    acct_match = re.search(logon_list_regex, selected_item, re.VERBOSE)
    if acct_match:
        acct = acct_match['acct']
        logon_type = int(acct_match['logon_type'])
        time_stamp = pd.to_datetime(acct_match['time'])
        logon_id = acct_match['logonid']
        return host_logons.query('TargetUserName == @acct and LogonType == @logon_type'
                                 ' and TargetLogonId == @logon_id')
    
logon_wgt = nbtools.SelectItem(description='Select logon cluster to examine', 
                             item_list=items, height='200px', width='100%', auto_display=True)

Out[103]:

Select(description='Select logon cluster to examine', layout=Layout(height='200px', width='100%'), options=('S…

In [104]:



selected_logon_cluster = get_selected_logon_cluster(logon_wgt.value)

def view_logon(x=''):
    global selected_logon
    selected_logon = get_selected_logon(x)
    display(get_selected_logon(x))
    

items = selected_logon_cluster.sort_values('TimeGenerated').apply(lambda x: (f'{x.TargetUserName}:    '
                                        f'(logontype={x.LogonType})   '
                                        f'(timestamp={x.TimeGenerated})    '
                                        f'logonid={x.TargetLogonId}'),
                             axis=1).values.tolist()
w = widgets.Select(options=items, description='Select logon instance to examine', **WIDGET_DEFAULTS)
    
interactive(view_logon, x=w)

Out[104]:

interactive(children=(Select(description='Select logon instance to examine', layout=Layout(width='95%'), optio…

Contents

Unusual Processes on Host - Clustering

Sometimes you don't have a source process to work with. Other times it's just useful to see what else is going on on the host. This section retrieves all processes on the host within the time bounds set in the query times widget.

You can display the raw output of this by looking at the processes_on_host dataframe. Just copy this into a new cell and hit Ctrl-Enter.

Usually though, the results return a lot of very repetitive and unintersting system processes so we attempt to cluster these to make the view easier to negotiate. To do this we process the raw event list output to extract a few features that render strings (such as commandline)into numerical values. The default below uses the following features:

commandLineTokensFull - this is a count of common delimiters in the commandline (given by this regex r'[\s-\/.,"'|&:;%$()]'). The aim of this is to capture the commandline structure while ignoring variations on what is essentially the same pattern (e.g. temporary path GUIDs, target IP or host names, etc.)
pathScore - this sums the ordinal (character) value of each character in the path (so /bin/bash and /bin/bosh would have similar scores).
isSystemSession - 1 if this is a root/system session, 0 if anything else.

Then we run a clustering algorithm (DBScan in this case) on the process list. The result groups similar (noisy) processes together and leaves unique process patterns as single-member clusters.

In [105]:



# Calculate time range based on the logons from previous section
logon_time = selected_logon_cluster['TimeGenerated'].min()
last_logon_time = selected_logon_cluster['TimeGenerated'].max()
time_diff = int((last_logon_time - logon_time).total_seconds() / (60 * 60) + 2)

# set the origin time to the time of our alert
proc_query_times = nbtools.QueryTime(units='hours', origin_time=logon_time,
                           before=1, after=time_diff, max_before=20, max_after=20)
proc_query_times.display()

Out[105]:

HTML(value='<h4>Set query time boundaries</h4>')

HBox(children=(DatePicker(value=datetime.date(2019, 2, 15), description='Origin Date'), Text(value='19:54:24.5…

VBox(children=(IntRangeSlider(value=(-1, 9), description='Time Range (hour):', layout=Layout(width='80%'), max…

In [106]:



from msticpy.sectools.eventcluster import dbcluster_events, add_process_features
print('Getting process events...', end='')
processes_on_host = qry.list_processes(provs=[proc_query_times], **params_dict)
print('done')
print('Clustering...', end='')
feature_procs = add_process_features(input_frame=processes_on_host,
                                     path_separator=params_dict['path_separator'])

feature_procs['accountNum'] = feature_procs.apply(lambda x: _string_score(x.Account), axis=1)
# you might need to play around with the max_cluster_distance parameter.
# decreasing this gives more clusters.
(clus_events, dbcluster, x_data) = dbcluster_events(data=feature_procs,
                                                    cluster_columns=['commandlineTokensFull', 
                                                                     'pathScore',
                                                                     'accountNum',
                                                                     'isSystemSession'],
                                                    max_cluster_distance=0.0001)
print('done')
print('Number of input events:', len(feature_procs))
print('Number of clustered events:', len(clus_events))
(clus_events.sort_values('TimeGenerated')[['TimeGenerated', 'LastEventTime',
                                          'NewProcessName', 'CommandLine', 
                                          'ClusterSize', 'commandlineTokensFull',
                                          'pathScore', 'isSystemSession']]
    .sort_values('ClusterSize', ascending=False))
print('done')

Out[106]:

Getting process events...done
Clustering...done
Number of input events: 1964
Number of clustered events: 150
done

In [107]:



selected_logon_cluster = get_selected_logon_cluster(logon_wgt.value)

def view_logon_sess(x=''):
    global selected_logon
    selected_logon = get_selected_logon(x)
    display(selected_logon)
    logonId = selected_logon['TargetLogonId'].iloc[0]
    sess_procs = (processes_on_host.query('TargetLogonId == @logonId | SubjectLogonId == @logonId')
                                          [['NewProcessName', 'CommandLine', 'TargetLogonId']]
                  .drop_duplicates())
    display(sess_procs)
    

items = selected_logon_cluster.sort_values('TimeGenerated').apply(lambda x: (f'{x.TargetUserName}:    '
                                        f'(logontype={x.LogonType})   '
                                        f'(timestamp={x.TimeGenerated})    '
                                        f'logonid={x.TargetLogonId}'),
                             axis=1).values.tolist()
sess_w = widgets.Select(options=items, description='Select logon instance to examine', **WIDGET_DEFAULTS)
    
interactive(view_logon_sess, x=sess_w)

Out[107]:

interactive(children=(Select(description='Select logon instance to examine', layout=Layout(width='95%'), optio…

Save Selected Session as Observation

In [108]:



if selected_logon is not None:
    display(Markdown('**Attacker Logon Session selected**'))
    display(selected_logon)
    logonid = selected_logon['TargetLogonId'].iloc[0]
    logon_time = selected_logon['TimeGenerated'].iloc[0]
    subj_account = entity.Account(src_event=selected_logon.iloc[0], role='subject')
    tgt_account = entity.Account(src_event=selected_logon.iloc[0], role='target')
    logon_session = entity.HostLogonSession(src_event=selected_logon.iloc[0])
    logon_session.Account = tgt_account
    logon_session.SessionId = logonid
    logon_session.Host = inv_host_entity
    display(Markdown('**Entities:**'))
    print('Subject Account:\n', subj_account)
    print('Target Account Session:\n', logon_session)
    
    add_observation(Observation(caption='Logon session identified for attacker IP',
                            description=f'Logon session for account {logon_session.Account.Name}',
                            item=logon_session,
                            link='examine_win_logon_sess'))

Out[108]:

Attacker Logon Session selected

Entities:

Subject Account:
 {'Name': 'MSTICAlertsWin1$', 'Sid': 'S-1-5-18', 'Type': 'account'}
Target Account Session:
 { 'Account': { 'LogonId': '0x1cfd78d',
               'Name': 'ian',
               'Sid': 'S-1-5-21-996632719-2361334927-4038480536-1120',
               'Type': 'account'},
  'EndTimeUtc': Timestamp('2019-02-15 19:54:24.590000'),
  'Host': { 'AdditionalData': { 'OSMajorVersion': '18',
                                'OSMinorVersion': '04',
                                'SourceComputerId': '44623fb0-bd5f-49ea-84d1-56aa11ab8a25'},
            'HostName': 'MSTICALERTSLXVM2',
            'OSFamily': 'Linux',
            'Type': 'host',
            'private_ips': [{"Address": "10.0.3.4", "Type": "ipaddress"}],
            'public_ips': [ {"Address": "104.211.30.1", "Location": {"CountryCode": "US", "CountryName": "United States", "State": "Virginia", "City": "Washington", "Longitude": -78.1704, "Latitude": 38.7163, "Type": "geolocation"}, "Type": "ipaddress"}]},
  'SessionId': '0x1cfd78d',
  'StartTimeUtc': Timestamp('2019-02-15 19:54:24.590000'),
  'Type': 'hostlogonsession'}

Contents

Processes for Selected LogonId

In [109]:

logonId = selected_logon['TargetLogonId'].iloc[0]
sess_procs = (processes_on_host.query('TargetLogonId == @logonId | SubjectLogonId == @logonId')
                                          [['TimeGenerated', 'NewProcessName', 'CommandLine']])

display(sess_procs)
add_observation(Observation(caption='Attacker commands on Victim 2',
                            description=f'Processes run in Attacker session',
                            item=sess_procs,
                            link='process_session'))

Out[109]:

Clustered Version of Previous Query - collapsing duplicates

In [110]:

display(clus_events.query('TargetLogonId == @logonId | SubjectLogonId == @logonId')
        [['TimeGenerated', 'NewProcessName', 'CommandLine', 'ClusterSize']])

Out[110]:

Optional (for the curious) - View clustering stats

In [111]:


# change False to True in the if statement to see this
if True:
    proc_plot = sns.catplot(y="processName", x="commandlineTokensFull", 
                            data=feature_procs.sort_values('processName'),
                            kind='box', height=10)
    proc_plot.fig.suptitle('All Processes - Variability of Commandline Tokens', x=1, y=1)

    plt.rcParams['figure.figsize'] = (5, 15)
    clus_plot = clus_events[['processName', 
                             'ClusterId', 
                             'ClusterSize']].groupby(['processName', 
                                                      'ClusterId']).sum().plot.barh()
    plt.title('Clustered Processes - cluster size of each command line pattern');

Out[111]:

Contents

Other Events on the Host

In [112]:



all_events_base_qry = '''
SecurityEvent
| where Computer =~ '{host}'
| where TimeGenerated >= datetime({start})
| where TimeGenerated <= datetime({end})
| where {where_filter}
'''
all_events_qry = all_events_base_qry.format(host=params_dict['host_name'],
                                            start=proc_query_times.start,
                                            end=proc_query_times.end,
                                            where_filter='EventID != 4688 and EventID != 4624')

%kql -query all_events_qry
all_events_df = _kql_raw_result_.to_dataframe()
display(all_events_df[['Account', 'Activity', 'TimeGenerated']].groupby(['Account', 'Activity']).count())

add_observation(Observation(caption='System account modifications during attack.',
                            description='Count of event types seen on system',
                            item=all_events_df[['Account', 'Activity', 'TimeGenerated']].groupby(['Account', 'Activity']).count(),
                            link='other_win_events'))

Out[112]:

In [113]:



# Function to convert EventData XML into dictionary and populate columns into DataFrame from previous query result
all_events_df['EventData'].iloc[10]
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import ParseError
SCHEMA='http://schemas.microsoft.com/win/2004/08/events/event'
def parse_event_data(row):
    try:
        xdoc = ET.fromstring(row.EventData)
        col_dict = {elem.attrib['Name']: elem.text for elem in xdoc.findall(f'{{{SCHEMA}}}Data')}
        reassigned = set()
        for k, v in col_dict.items():
            if k in row and not row[k]:
                row[k] = v
                reassigned.add(k)
        if reassigned:
            #print('Reassigned: ', ', '.join(reassigned))
            for k in reassigned:
                col_dict.pop(k)
        return col_dict
    except ParseError:
        return None

all_events_df['EventProperties'] = all_events_df.apply(parse_event_data, axis=1)

Contents

Office 365 Activity

In [114]:



# set the origin time to the time of our alert
o365_query_times = nbtools.QueryTime(units='hours', origin_time=security_alert.origin_time,
                           before=1, after=10, max_before=20, max_after=20)
o365_query_times.display()

Out[114]:

HTML(value='<h4>Set query time boundaries</h4>')

HBox(children=(DatePicker(value=datetime.date(2019, 2, 18), description='Origin Date'), Text(value='15:29:28',…

VBox(children=(IntRangeSlider(value=(-1, 10), description='Time Range (hour):', layout=Layout(width='80%'), ma…

Execute queries to get the data

In [118]:



print('Running queries...', end=' ')
# Queries
ad_changes_query = '''
OfficeActivity
| where TimeGenerated >= datetime({start})
| where TimeGenerated <= datetime({end})
| where RecordType == 'AzureActiveDirectory'
| where Operation in ('Add service principal.',
                      'Change user password.', 
                      'Add user.', 
                      'Add member to role.')
| where UserType == 'Regular' 
| project OfficeId, TimeGenerated, Operation, OrganizationId, 
          OfficeWorkload, ResultStatus, OfficeObjectId, 
          UserId = tolower(UserId), ClientIP, ExtendedProperties
'''.format(start = o365_query_times.start, end=o365_query_times.end)
%kql -query ad_changes_query
ad_changes_df = _kql_raw_result_.to_dataframe()


office_ops_query = '''
OfficeActivity
| where TimeGenerated >= datetime({start})
| where TimeGenerated <= datetime({end})
| where RecordType in ("AzureActiveDirectoryAccountLogon", "AzureActiveDirectoryStsLogon")
| extend UserAgent = extractjson("$[0].Value", ExtendedProperties, typeof(string))
| union (
    OfficeActivity 
    | where TimeGenerated >= datetime({start})
    | where TimeGenerated <= datetime({end})
    | where RecordType !in ("AzureActiveDirectoryAccountLogon", "AzureActiveDirectoryStsLogon")
)
| where UserType == 'Regular'
'''.format(start = o365_query_times.start, end=o365_query_times.end)
%kql -query office_ops_query
office_ops_df = _kql_raw_result_.to_dataframe()

office_ops_summary_query = '''
let timeRange=ago(30d);
let officeAuthentications = OfficeActivity
| where TimeGenerated >= timeRange
| where RecordType in ("AzureActiveDirectoryAccountLogon", "AzureActiveDirectoryStsLogon")
| extend UserAgent = extractjson("$[0].Value", ExtendedProperties, typeof(string))
| where Operation == "UserLoggedIn";
officeAuthentications
| union (
    OfficeActivity 
    | where TimeGenerated >= timeRange
    | where RecordType !in ("AzureActiveDirectoryAccountLogon", "AzureActiveDirectoryStsLogon")
)
| where UserType == 'Regular'
| extend RecordOp = strcat(RecordType, '-', Operation)
| summarize OpCount=count() by RecordType, Operation, UserId, UserAgent, ClientIP, bin(TimeGenerated, 1h)
// render timeline
'''.format(start = o365_query_times.start, end=o365_query_times.end)
%kql -query office_ops_summary_query
office_ops_summary_df = _kql_raw_result_.to_dataframe()
# %kql -query office_ops_query
# office_ops_df = _kql_raw_result_.to_dataframe()

office_logons_query = '''
let timeRange=ago(30d);
let officeAuthentications = OfficeActivity
| where TimeGenerated >= timeRange
| where RecordType in ("AzureActiveDirectoryAccountLogon", "AzureActiveDirectoryStsLogon")
| extend UserAgent = extractjson("$[0].Value", ExtendedProperties, typeof(string))
| where Operation == "UserLoggedIn";
let lookupWindow = 1d;
let lookupBin = lookupWindow / 2.0; 
officeAuthentications | project-rename Start=TimeGenerated
| extend TimeKey = bin(Start, lookupBin)
| join kind = inner (
    officeAuthentications
    | project-rename End=TimeGenerated
    | extend TimeKey = range(bin(End - lookupWindow, lookupBin), bin(End, lookupBin), lookupBin)
    | mvexpand TimeKey to typeof(datetime)
) on UserAgent, TimeKey
| project timeSpan = End - Start, UserId, ClientIP , UserAgent , Start, End
| summarize dcount(ClientIP) by  UserAgent
| where dcount_ClientIP > 1
| join kind=inner (  
officeAuthentications
| summarize minTime=min(TimeGenerated), maxTime=max(TimeGenerated) by UserId, UserAgent, ClientIP
) on UserAgent
'''
%kql -query office_logons_query
office_logons_df = _kql_raw_result_.to_dataframe()

print('done.')

Out[118]:

Running queries... done.

Any IP Addresses in our alert IPs that match Office Activity?

In [119]:

# Any IP Addresses in our alert IPs that match?


for ip in alert_ip_entities:
    susp_o365_activities = office_ops_df[office_ops_df['ClientIP'] == ip.Address]
    susp_o365_summ = (office_ops_df[office_ops_df['ClientIP'] == ip.Address]
                         [['OfficeId', 'UserId', 'RecordType', 'Operation']]
                         .groupby(['UserId', 'RecordType', 'Operation']).count()
                         .rename(columns={'OfficeId': 'OperationCount'}))
    
    display(Markdown(f'### Activity for {ip.Address}'))
    
    if len(susp_o365_summ) > 0:
        display(susp_o365_summ)
    
        add_observation(Observation(caption=f'O365 activity from suspected attacker IP {ip.Address}',
                                    description=f'Summarized operation count for each user/service/operation type',
                                    item=susp_o365_summ,
                                    link='o365_match_ip'))
    else:
        display(Markdown('No activity detected'))

Out[119]:

Activity for 23.97.60.214

No activity detected

In [120]:



for susp_ip in [ip.Address for ip in alert_ip_entities]:

    display(Markdown(f'### Timeline of operations originating from suspect IP Address: {susp_ip}'))
    display(Markdown(f'**{susp_ip}**'))
    suspect_ip_ops = office_ops_df[office_ops_df['ClientIP'] == susp_ip]
    if len(suspect_ip_ops) == 0:
        display(Markdown('No activity detected'))
        continue
    sel_op_type='FileDownloaded'
    nbdisp.display_timeline(data=suspect_ip_ops, title=f'Operations from {susp_ip} (all=blue, {sel_op_type}=green)',
                             overlay_data=suspect_ip_ops.query('Operation == @sel_op_type'),
                            source_columns=['UserId', 'RecordType', 'Operation'])
    
    # Uncomment line below to see all activity
    # display(suspect_ip_ops.sort_values('TimeGenerated', ascending=True).head())

Out[120]:

Timeline of operations originating from suspect IP Address: 23.97.60.214

23.97.60.214

No activity detected

Look for high-frequency operations - like automated or bulk uploads/downloads

Anything above or approaching 1 operation/sec is likely an automated or bulk operation

In [ ]:



timed_slice_ops = office_ops_df[['RecordType', 'TimeGenerated', 'Operation'
       'OrganizationId', 'UserType', 'OfficeWorkload',
       'ResultStatus', 'OfficeObjectId', 'UserId', 'ClientIP', 'Start_Time']]
timed_slice_ops2 = timed_slice_ops.set_index('TimeGenerated')

hi_freq_ops = (timed_slice_ops2[['UserId', 'ClientIP', 'Operation', 'RecordType']]
                .groupby(['UserId', 'ClientIP', 'RecordType', 'Operation']).resample('10S').count()
                .query('RecordType > 10')
                .drop(['ClientIP', 'UserId', 'RecordType'], axis=1)
                .assign(OpsPerSec = lambda x: x.Operation / 10)
                .rename(columns={'Operation': 'Operation Count'}))

if len(hi_freq_ops) > 0:
    display(hi_freq_ops)
    add_observation(Observation(caption=f'O365 bulk/high freq operations seen',
                                    description=f'Summarized operation count bulk actions',
                                    item=hi_freq_ops,
                                    link='o356_high_freq'))

Other Background Data for O365

In [ ]:



display(Markdown('### IPs and User Agents - frequency of use'))
office_ops_df['UserId'] = office_ops_df['UserId'].str.lower()
display(Markdown('Distinct IPs by num of operations'))
display(office_ops_df[['ClientIP', 'Operation']].groupby(['ClientIP']).count())
display(Markdown('Distinct UserAgents by num of operations'))
office_ops_df[['UserAgent', 'Operation']].groupby(['UserAgent']).count()

In [ ]:



off_ip_locs = (office_ops_df[['ClientIP']]
                   .drop_duplicates()
                   .apply(lambda x: 
                          iplocation.lookup_ip(ip_address=x.ClientIP)[1]
                          if x.ClientIP and x.ClientIP != '<null>' else None, axis=1)
                   .tolist())
ip_locs = [ip_list[0] for ip_list in off_ip_locs if ip_list]
    
flow_map = create_ip_map()
display(HTML('<h3>External IP Addresses seen in Office Activity</h3>'))
display(HTML('Numbered circles indicate multiple items - click to expand.'))


icon_props = {'color': 'purple'}
flow_map = add_ip_cluster(folium_map=flow_map,
                            ip_entities=ip_locs,
                            **icon_props)
display(flow_map)

display(Markdown('<p style="color:red">Warning: the folium mapping library '
                 'does not display correctly in some browsers.</p><br>'
                 'If you see a blank image please retry with a different browser.'))

In [ ]:



with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    display(Markdown('### Change in rate of Activity Class (RecordType) and Operation'))
    sns.relplot(data=office_ops_summary_df, x='TimeGenerated', y='OpCount', kind='line', aspect=2, 
                hue='RecordType')
    sns.relplot(data=office_ops_summary_df.query('RecordType == "SharePointFileOperation"'), 
                x='TimeGenerated', y='OpCount', hue='Operation', kind='line', aspect=2)

In [ ]:



with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    display(Markdown('### Identify Users/IPs with largest operation count'))
    office_ops_summary_df['UserId'] = office_ops_summary_df['UserId'].str.lower()

    sns.catplot(data=office_ops_summary_df, x='UserId', y='OpCount', 
                hue='Operation', aspect=2).set_xticklabels(rotation=30)
    office_ops_summary_df.pivot_table('OpCount', index=['ClientIP', 'UserId'], 
                                      columns='Operation').style.bar(color='orange', align='mid')

Extract distinctive events from O365 Operations

In [ ]:



from msticpy.sectools.eventcluster import (dbcluster_events, 
                                           add_process_features, 
                                           char_ord_score,
                                           token_count,
                                           delim_count)

feature_office_ops = office_ops_df.copy()
feature_office_ops['ip_num'] = feature_office_ops.apply(lambda x: char_ord_score(x, 'ClientIP'), axis=1)
feature_office_ops['ua_tokens'] = feature_office_ops.apply(lambda x: char_ord_score(x, 'UserAgent'), axis=1)
feature_office_ops['oid_tokens'] = feature_office_ops.apply(lambda x: char_ord_score(x, 'OfficeObjectId'), axis=1)

# you might need to play around with the max_cluster_distance parameter.
# decreasing this gives more clusters.
(clustered_ops, dbcluster, x_data) = dbcluster_events(data=feature_office_ops,
                                                      cluster_columns=['ip_num', 
                                                                     'ua_tokens', 
                                                                     'oid_tokens'],
                                                      time_column='TimeGenerated',
                                                      max_cluster_distance=0.0001)
print('Number of input events:', len(feature_office_ops))
print('Number of clustered events:', len(clustered_ops))
(clustered_ops[['TimeGenerated', 'RecordType',
                'Operation', 'UserId', 'UserAgent', 'ClusterSize',
                'OfficeObjectId']]
    .query('ClusterSize <= 2')
    .sort_values('ClusterSize', ascending=True))

Contents

Summary

In [ ]:

for observation in observation_list.values():
    display_observation(observation)

Contents

Appendices

Available DataFrames

In [ ]:

print('List of current DataFrames in Notebook')
print('-' * 50)
current_vars = list(locals().keys())
for var_name in current_vars:
    if isinstance(locals()[var_name], pd.DataFrame) and not var_name.startswith('_'):
        print(var_name)

Saving Data to Excel

To save the contents of a pandas DataFrame to an Excel spreadsheet use the following syntax

writer = pd.ExcelWriter('myWorksheet.xlsx')
my_data_frame.to_excel(writer,'Sheet1')
writer.save()

Platform Requirements

Python Version: Python 3.6 (including Python 3.6 - AzureML)
Required Packages: kqlmagic, msticpy, pandas, numpy, matplotlib, networkx, ipywidgets, ipython, scikit_learn, dnspython, ipwhois, folium, maxminddb_geolite2
Platforms Supported:

Azure Notebooks Free Compute
Azure Notebooks DSVM
OS Independent

Data Sources Required:

Log Analytics - SecurityAlert, SecurityEvent (EventIDs 4688 and 4624/25), AuditLog_CL (Linux Auditd), OfficeActivity, AzureNetworkAnalytics_CL, Heartbeat
(Optional) - VirusTotal (with API key)

Title: Sample Hunting and Investigation in Jupyter

Linux, Windows, Network and Office data

Description:

The Investigation Narrative

Warning: Example Notebook - Not for production use!

Table of Contents

Setup

Install Packages

Import Packages

Part 1 - Threat Intel Report

Getting IoC IP Addresses

Authenticate to Microsoft Sentinel

Get the Workspace ID

Authenticate to Log Analytics

Search for C2

Set Query Time Range

Examine an Alert

Select alert from list

Check alert for IP addresses not contained in entities

Basic IP Checks

Reverse IP and WhoIs

Geo IP Lookup

Threat Intel - Check the IP Address for known malicious addresses

Lookup in Microsoft Sentinel Bring-Your-Own-Threat-Intel

Lookup in VirusTotal

End of Part 1

Part 2 - See What's going on on the Affected Host - Linux

Using Linux Audit data to view processes

Linux Audit Logs - To Dos

Event Types collected

Extract Individual Event Types for logon and process events

Failure Events

Extract IPs from all Events

Get Logins with IP Address Recorded

What's happening in these sessions?

Find Distinctive Process Patterns - Clustering

Badness Uncovered!

Part 2b - Host Network Data

Get the IP Address of the Source Host

Check Communications with Other Hosts

Query Flows by Host IP Addresses

Flow Time and Protocol Distribution

Isolated SSH traffic

Seems suspicious, so Record findings

Interactive Flow Timeline

GeoLocation Mapping

Look at 'Denied' Flows - who's trying to get in from where?

Optional and can take a long time

DNS Activity Includes any of these IPs?

Have any other hosts been communicating with this address(es)?

1 unseen IP Address found in this data: ['10.0.3.5']

Note you should re-run this section for each new IP Address found to determine who it belongs to

Part 3 - Windows Host and Office 365

Other Hosts Communicating with IP

Check Host Logons

Distinct host logon patterns

Classification of Logon Types by Account

Counts of logon events by logon type.

Logon Timeline.

Check for Failed Logons

Examine a Logon Session

Select a Logon ID To Examine

Unusual Processes on Host - Clustering

View processes used in login session

Save Selected Session as Observation

Processes for Selected LogonId

Clustered Version of Previous Query - collapsing duplicates

Optional (for the curious) - View clustering stats

Other Events on the Host

Office 365 Activity

Execute queries to get the data

Any IP Addresses in our alert IPs that match Office Activity?

Activity for 23.97.60.214

Timeline of operations originating from suspect IP Address: 23.97.60.214

Look for high-frequency operations - like automated or bulk uploads/downloads

Anything above or approaching 1 operation/sec is likely an automated or bulk operation

Other Background Data for O365

Extract distinctive events from O365 Operations

Summary

Appendices